Chapter II:
Testing the generalizability of the choice overload hypothesis


As outlined in the previous chapter, the effect of too much choice has important practical and theoretical implications, yet its theoretical underpinnings are debated. Likewise, even though its empirical foundation is growing, there are also a number of divergent findings. These conceptual shortcomings are in contrast with the increasing amount of attention the effect has received from both inside and outside psychology (Botti & Iyengar, 2006; Kuksov & Villas-Boas, 2005; Lane, 2000; Mick et al., 2004; Schwartz, 2004).

Need for a model 

Clearly, a precise and testable model of the underlying psychological processes and mechanisms would be highly desirable. Yet, before starting to build such a model, it is important to ensure that the effect of too much choice is robust and replicable. This is especially important given that, as mentioned in Chapter I, there is a considerable body of empirical evidence backed by sound theoretical arguments that speak in favor of large assortments.

Need for replication

According to the statistician Ronald Fisher, “no isolated experiment, however significant in itself, can suffice for the experimental demonstration of any natural phenomenon” (Fisher, 1971, p. 13). Along the same lines, Levin (1998) proposed that “instead of measuring the quality of research by the level of significance, it would be better judged by its consistency of results in repeated experiments” (p. 92). Similarly, other scholars argue that scientific findings rest upon replication and they recommend skepticism about nonreplicated results (Evanschitzky, Baumgarth, Hubbard, & Armstrong, 2007). Following these calls, I will subsequently describe a series of three studies that were intended to empirically test the replicability of the effect of too much choice across different contexts and choice situations in the lab and in the field.

Jam study



To test the generalizability of the too-much-choice effect across different contexts, I first sought a situation in which the a priori probability of finding the effect would be high. In the jam study reported by Iyengar and Lepper (2000), 3% of people exposed to the large assortment made a purchase versus 30% of those exposed to the small assortment. The corresponding effect size is d=0.77, which Cohen (1977) operationally defines as a large effect. Therefore, I strived to replicate that study as closely as possible.


Experimental setup

My experimental setup closely followed the one described by Iyengar and Lepper (2000, Study 1).The study took place on two consecutive Saturdays on the sales floor of an upscale grocery store in Berlin that is famous for its extraordinary assortment size. At the entrance to the store a table was placed on which a variable number of jams in jars were displayed. To reduce the chances of participants having strong prior preferences, I chose a brand of high-quality jam (Lafayette Confiture) that offers many different exotic flavors and that was only sold at that particular store. The jars were lined up in random order. The name of the flavor was written on the jar and on a paper tag in front of it. A sign above the table invited customers to stop and taste the jams. Each customer who stopped at the table received a coupon, valid for one week, to purchase any Lafayette Confiture for a reduced price. Jams for purchase were found on a shelf elsewhere in the store. On each Saturday, the table was operated by two female assistants recruited from a local university. The assistants were paid a regular hourly wage. Although they knew that the data would be used for scientific purposes, they were unaware of the specific hypotheses of the study. Every customer who approached the tasting table received a coupon from one of the assistants and was counted as a participant in the study, even if he or she decided not to taste any jam.

Dependent and independent variables

The numbers of jams displayed as well as the value of the coupon were both subject to experimental manipulation and thus depict the two main independent variables. The value of the coupon used was fixed to 1.0 euro on the first Saturday and 0.50 euro on the second Saturday. The regular price of Lafayette Confiture was 3.90 euros for all flavors. The number of jams on the table was either 6 (small assortment) or 24 (large assortment). The two assortment sizes were switched on an hourly basis. on each of the two Saturdays the study was run for 8 hours, so each assortment size was on display for 4 hours. The number of redeemed coupons represents the main dependent variable and was taken as a measure of purchase motivation. By using a small differentiating mark on the coupons handed out, the number of redeemed coupons could be counted separately for each condition and for each gender.

Large assortment


As the total number of different flavors of Lafayette Confiture is 24, the large assortment consisted of all available flavors of the brand. In contrast to Iyengar and Lepper, I deemed it unnecessary to take out the most common flavors because all the flavors of Lafayette Confiture are very exotic. In fact, the brand was tailor-made for the store to complement their regular jam assortment with flavors that are not offered by any other manufacturer. During the whole time of the study, all 24 flavors of Lafayette Confiture were constantly available on the jam shelves in the store.

Small assortment

The jams that made up the small assortment were chosen based on a pretest similar to the one used by Iyengar and Lepper: In the pretest, 42 students from a local university were given a list with the names of all 24 flavors of Lafayette Confiture. Out of that list, each student had to indicate the four “best-sounding” flavors, four “good- but not excellent-sounding” flavors, and four “worst-sounding” flavors. In exchange for their participation, students received a chocolate bar. Based on the aggregated data, the two most attractive, two least attractive and two medium attractive flavors were selected for the first small set (ss1). This procedure was chosen to exactly replicate Iyengar and Lepper’s study, but, it proved rather imprecise due to overlap in the classification of jams (e.g. some flavors ranked equally high in attractiveness and unattractiveness). To counteract this imprecision, a second, alternative set of six jams was randomly selected (ss2). Within the 4 hours that the small assortment was on display each day, sets ss1 and ss2 were displayed for 2 hours each.

Additional measures

In extension to the experimental setup used by Iyengar and Lepper, I numbered all coupons consecutively. Because the numbering was hidden within a pseudo barcode printed on each coupon, it could hardly be noticed by the consumers. Based on the numbering, the assistants at the table discreetly recorded the jam flavor(s) that each consumer tasted. The cashiers at the store’s exit noted the flavor of each purchased jam on the back of the coupon that was used for this purchase. While the vast majority of participants did not take notice of these recordings, the few that did were told what type of data was recorded and that it would be used for the purpose of market research.




In total, 504 customers (297 female, 207 male) were included in the study: 193 on the first Saturday and 311 on the second Saturday. Across both Saturdays, 239 participants saw the large assortment, 128 saw the small assortment ss1 and 137 saw the ss2 assortment. Across all conditions, 33% of all participants redeemed their coupon and 60 participants (12%) did not taste any jam. Those who did tasted 1.7 jams on average and 74% of all participants tasted between one and two jams.

There were almost no differences between the two small assortments ss1 and ss2 in terms of gender of tasters, number of jams tasted, and percentage of redeemed coupons. Therefore, the data of the two sets was collapsed for subsequent analyses into one “small assortment.”

Effect of coupon value

There was a main effect of coupon value: 46% of all participants redeemed a coupon that was worth 1.0 euro while this percentage dropped to 24% when the coupon value was 0.50, t(504)=5.07; p<.001. However, as the coupon value is confounded with the day of the study (first vs. second Saturday), the effect can also be due to the higher number of participants on the second Saturday. On the second Saturday, participants tasted less jam (an average of 1.3 jams compared to 1.9 during the first Saturday), t(502)=6.75; p<.001, which could be due to the lower coupon value but also to the higher number of people at the tasting table, which might have prompted people to give others a chance to taste. Nonetheless, no effect of assortment size was seen in the second session either (Figure 3).

Effect of assortment size


Across both Saturdays, participants who saw the large assortment tasted slightly more jams than participants who saw the small assortment (1.6 vs. 1.4 jams), t(502)=2.37; p=.018. However, there was no effect of assortment size on the number of redeemed coupons (32% in the large condition vs. 33% in the small condition), t(504)=0.19; p=.853; Cohen’s d=0.03. There was no interaction effect between the number of redeemed coupons and coupon value or taster’s gender. Independent of assortment size there was a small positive relationship between the number of jams tasted and the probability of redeeming a coupon (r=.26).

Figure 3: Effect of assortment size and coupon value on the percentage of redeemed coupons compared to the findings of Iyengar and Lepper (2000)

Match between the flavors tasted and flavors purchased

In the large condition with all 24 jams on display, 77 participants redeemed coupons. Of these, 26 pa rticipants (34%) purchased a jam that they had actually tasted at the booth and the other 66% of the participants purchased a jam they had not tasted but which was displayed at the table. Of the 80 participants who redeemed a coupon in the small conditions, 31 (39%) bought a jam that they had tasted and 34 (43%) bought a jam that was displayed at the booth but that they did not taste. The remaining 15 participants (19%) bought a jam that was not on display at the booth but only on the shelf of the store.



Despite the fact that the study closely followed the setup used by Iyengar and Lepper, I did not find a relationship between assortment size and motivation to purchase. The relationship between purchases and coupon value suggests that the experimental manipulation was successful, yet the effect size of the set-size manipulation (Cohen’s d=0.03) is in sharp contrast to the strong effect reported by Iyengar and Lepper (30% vs. 3%, d=0.77). Under the assumption that Iyengar and Lepper’s findings depict the actual population effect size, and assuming an alpha of 0.05 (one-sided), the power (1-ß) of my experiment would be greater than 0.995. In other words, in my study the probability of finding an effect of Iyengar and Lepper’s magnitude and by this correctly rejecting a false null hypothesis was very high. In fact, even under the assumption of a small effect of d=0.3, the power of my experiment would have been about 0.95, which is still far higher than the convention of 0.8 proposed by Cohen (1977).

This prior analysis of statistical power shows that the different findings are probably not due to random variation in the data or mere chance, which leads to the question of what might have been the reason for the divergent results.

Different types of jam

It could be that the very type of jam made a difference. Whereas Iyengar and Lepper used Wilkin and Sons jam, I used Lafayette Confiture. While both brands are of high quality and almost equal in price, Lafayette Confiture comprises more exotic flavors than Wilkin and Sons. But given that unfamiliarity with the available options is seen as a prerequisite of the too-much-choice effect because it circumvents preference matching, this should have boosted rather than diminished the effect in my study.

Differences in the small assortment


One important factor that may contribute to the presence or absence of the too-much-choice effect is the composition of the small choice set determined by the name-rating pretest. The rationale behind that pretest was to create a small assortment that had a wide range with regard to perceived attractiveness of flavors. Whereas the pretest relied solely on names, the participants in the store could see (and maybe also smell) the jams. This could have changed the perception of attractiveness and it brings the validity of the pretest into question. With the data on hand, attractiveness can be operationalized in different ways, for instance, based on the number of times a jam on the table was tasted (visual attractiveness) or based on the number of times a jam was purchased (purchase attractiveness). Across all jams, visual attractiveness and purchase attractiveness are positively correlated with r=.66 but the attractiveness ratings of the pretest do not match up with these in-store attractiveness measures. The two jams rated as least attractive in the pretest were in the top quartile of jams that were tasted and bought most often in the store. Of the six jams in the small set, the most attractive jam according to the pretest turned out to be the jam tasted least often at the booth. Nevertheless, in the present study both small sets of jam (ss1 and ss2) still turned out to be widely varied in terms of both purchase and visual attractiveness, but this is largely due to lucky chance given the low validity of the pretest ratings.

These findings have a straightforward implication: If Iyengar and Lepper’s pretest was as invalid as mine, it could have been that by mere chance, they ended up with a small set that consisted of the most attractive jams (in terms of visual—or purchase—attractiveness) in the assortment. In this case, the probability of purchase from the small set could have been artificially increased in their study, which would then be interpreted as a too-much-choice effect. As the pretest data from Iyengar and Lepper is not available, I tested the influence of the attractiveness of the small set in two separate experiments by using restaurants and charity organizations as options to choose from. Both experiments will be outlined in Chapter III below.

Presentation of the jams on the table

In the present study, the jam jars on the tasting table were lined up in an orderly fashion. This setup made it easy for customers to get an overview of the assortment even in the large choice condition. In contrast, in the study by Iyengar and Lepper, the jam jars were displayed in a rather disordered and messy way (Iyengar, personal communication). These differences may have led to two different effects that potentially reduced the choice overload in my study.


First, it can be argued that it is not the objective assortment size that matters but how it is perceived by the decision maker. As mentioned in Chapter I, Kahn and Wansink (2004) showed experimentally that an unstructured display can increase the perception of variety. Thus, the unordered setup used by Iyengar and Lepper might have induced the participants to perceive the choice set as even larger than it was. However, although this difference might explain a quantitative difference in the effect sizes between the studies, it is not sufficient to explain why I did not find any effect at all.

Second, as already mentioned by Iyengar and Lepper, “the display of 24 jams may have aroused the curiosity of otherwise uninterested passers-by” (p. 998). Because at my tasting booth the presentation was in a rather orderly fashion, even the large assortment condition could hardly be mistaken for anything other than a tasting table of jam. Thus, Iyengar and Lepper might be right in their conjecture that the effect in their study was not unique to the number of options but rather occurred because customers were attracted to the tasting booth for very different reasons in the small condition as compared to the large condition, and that consumers who approached the large assortment never intended to make a purchase.

Different expectations of the participants

Both studies were conducted on the sales floor of upscale grocery stores that were comparable in their assortment structure. Draeger’s, the store used by Iyengar and Lepper, at the time offered about 314 different jams while at the Berlin site, 280 jams were available. However, the store I used was located in the very center of Berlin, where, especially on a Saturday, it is visited by a lot of tourists. Draeger’s, on the other hand, is probably more frequented by local people doing their weekly shopping. As a consequence, the participants in the present study might have been much more interested in having a lot of choice and also might have perceived the assortment as exciting and motivating. The participants in Iyengar and Lepper’s study might have had a different motivation, that is, to get through their shopping list and proceed to the exit as soon as possible. It could be that in the latter case, a large assortment would be demotivating, because it takes relatively longer to browse through all the options. I addressed this hypothesis in a follow-up study on the sales floor of a regular day-to-day grocery store in a residential neighborhood that will be outlined in full detail next.

Wine study



Even though I tried to replicate the original Iyengar and Lepper jam study as closely as possible, there are still a number of differences that could explain why I did not find a too-much-choice effect as they did. To account for the possibility that this was because of special features of my experimental site (e.g. tourists as customers), I conducted a follow-up experiment at an organic grocery store in a residential area where people did their daily grocery shopping. Instead of jams, I used wine because even in small shops, wine assortments can be very large, which makes wine an appropriate stimulus for this type of study. Like jam, wine is also a common product, so to reduce the chance that people had strong prior preferences among the experimental assortment, I used exotic varieties, namely, organically grown Spanish red wines.


Small and large assortments

The large assortment consisted of all 12 organically grown Spanish red wines available at the store. The wines were from different regions in Spain and all were within a price range of 4 to 7 euros (approx. $4.80 to $8.40). The wines were in dark-colored bottles that only slightly differed in shape, but each had a different label. Eleven of the bottles had a volume of 0.75 liters and one bottle contained one liter of wine.

Three wines were selected for the small assortment based on a pretest conducted at my institute. There, the 12 bottles of the large assortment were placed on a table next to the entrance of the institute’s canteen. Just as in the main study to come, the name of the vineyard as well as the price for one bottle was printed on a small tag and put in front of each bottle. The first 50 people who passed by the table (mostly researchers and administrative staff) were asked which wine looked most appealing to them. Based on the resulting attractiveness ranking, the most attractive, the least attractive, and one medium-attractive wine were chosen for the small assortment.

Experimental setting


The setup of the main wine study closely followed the setup of the previous jam study. The study was conducted on the sales floor of a large organic grocery store on two consecutive weekends (Friday and Saturday) from 4 pm to 8 pm (a total of 16 hours). A tasting table with the wine was set up just inside the entrance to the store. A sign in front of the table informed customers that there would be a wine tasting on that day. On all 4 days, the tasting was run by a female assistant who was aware of my hypothesis. The assistant handed out the wine in small disposable plastic cups in servings of about 20 milliliters. People were invited to taste as many wines as they wanted and everyone who tasted was asked if he or she wanted to taste another sample of wine. On each of the 4 days, the large and small assortments of wines were rotated on an hourly basis (with the small assortment displayed at 5–6 pm and 7–8 pm).

Everyone who stopped at the tasting table received a coupon to get 1.0 euro off any organically grown Spanish red wine. Each coupon had a unique number and was valid for 1 week. As in the jam study, consumers who decided to purchase a wine had to pick it up from a regular wine shelf at the very end of the store. To make it easier for the customers to remember what they had tasted, the name of the tasted wine(s) was marked on the back of the coupon. The number of redeemed coupons within each condition was taken as a measurement of purchase motivation.

The assistant at the tasting booth recorded which wines were tasted by each participant. For each redeemed coupon, the shop cashier recorded the name of the wine that was bought with that coupon.



In total, during the four afternoons of the study, 280 customers stopped at the tasting table and received a coupon (141 for the large assortment and 139 for the small assortment). Everyone who stopped and received a coupon was counted as a participant in my study. Of the participants, 168 were women. Six participants shopped with a partner; the others were on their own. Out of the 280 participants across both conditions, 172 (61%) tasted a wine and 102 (36%) purchased a bottle. Of the participants who saw the large assortment, 83 (59%) tasted at least one wine and so did 89 (64%) of those participants who saw the small assortment. Among those who tasted, the mean number of tasted wines was 2.1 and in total, 93% of all participants who stopped tasted between one and three different wines. The average number of wines tasted in the large assortment was 2.4 as compared to 1.9 in the small assortment, t(170)= 3.4; p=.001; Cohen’s d=0.4.

With regard to the main dependent variable, the number of redeemed coupons, there was hardly any difference between the small and the large assortment. In the large assortment, 54 participants (38%) redeemed a coupon while in the small assortment, 48 (35%) did so, t(278)=0.55; p=.579; d=−0.10.


As in the previous study, no too-much-choice effect could be found. With the sample size on hand and alpha set at 0.05 (one-sided), the statistical power of finding an effect of d=0.77 (the magnitude in Iyengar and Lepper’s jam study) is >.995. For a small effect size of d=0.3, the statistical power would still be about 0.8. This relatively high power makes it unlikely that the null hypothesis was accepted by mistake.


The diverging results occurred despite the fact that the experimental site was a busy store where people did their daily grocery shopping and presumably did not enter for the sake of experiencing a large assortment. This questions possible explanations for the lack of effect in my first study that relate to special consumer expectations and the shop environment. However, with a total of 12 bottles, the large assortment might have been yet too small. Also, even though I used an exotic wine, people might still have had prior preferences that enabled them to engage in preference matching. To rule out these explanations, it would be advantageous to assess the degree of prior preferences independently from the choice. Also, as already mentioned, it might be the perception of variety that eventually matters rather than the absolute number of options. To collect this kind of data, a more controlled experiment would be necessary.

Both the wine study and the jam study were set up such that customers always had to go to the shelf elsewhere in the store if they wanted to make a purchase. Thus, even the participants in the small condition were eventually confronted with a large assortment. This raises the question of why customers should be affected by the large assortment at the tasting booth but not at the shelf (i.e. why customers making a choice at the small-assortment tasting would not be scared off from selecting their purchase when they had to get it from the usual large-assortment store shelf). To maintain the logic of the experiment, it has to be assumed that the participants decided whether to purchase while at the tasting table and that those who decided to purchase would not reconsider that decision at the shelf.

Yet in the wine study, of the 48 participants in the small assortment who bought a wine, 30 (60%) bought one that was not displayed on the tasting table. In the jam study, this percentage was 19%. This indicates that in both experiments a fair number of participants must have made the final decision in front of the shelf. It can be conjectured that similar data would have been found in Iyengar and Lepper’s jam study, yet as they did not collect data on the exact jams being purchased, it could not be measured.

Jelly bean study



A field experiment like the one outlined above does not allow for strict variable control. To make sure the failed replication was not due to some third variable and to explicitly control for prior preferences and differences in the subjective perception of variety, I switched to laboratory experiments. As outlined in Chapter I, Iyengar and Lepper (2000) also found the effect of too much choice in a well-controlled experiment based on choices for exotic chocolates. In that study, people were less satisfied when choosing from a large assortment as compared to a small one. Also, people who chose from a large assortment were less likely to accept chocolate rather than money as compensation. For both of these dependent variables the effect sizes were high; therefore I next aimed to replicate that study. My experimental setup resembled the study conducted by Iyengar and Lepper. The main difference between their experiment and mine was that I used Jelly Belly® jelly beans instead of chocolate.


Similar to the original experiment, the task in my study was to choose, eat, and rate one jelly bean out of an assortment of 6 or 30 different flavored jelly beans (between participants). The beans were presented on a tray that was divided into small sections of equal size. Each section contained one bean and a label with the name of the flavor. For the large assortment, a large tray was used (5 rows of 6 beans in 60 × 60 centimeters) while the small assortment was presented on a small tray (1 row of 6 beans in 60 × 12 centimeters). Five small assortments were used, and each was a subset of the large assortment such that each bean in the large assortment was equally often presented (across participants) in the small assortments. This setup closely resembles that of Iyengar and Lepper.

The task of the participants was to select one of the jelly beans, eat it, and rate it. To explicitly control for prior preferences, before the actual choice participants were asked if they had ever heard of jelly beans and how often they had eaten them before. As mentioned in the Introduction, what eventually matters might be the perception of variety rather than the number of options per se (Kahn & Wansink, 2004). Therefore, as a manipulation check to test if the two assortments were perceived as different in size, participants also rated the assortment size they saw on a Likert scale ranging from 1 (too few jelly beans) to 9 (too many jelly beans).


In between choosing and tasting, participants were asked to rate the difficulty, frustration, and enjoyment of the choice process and to anticipate their satisfaction with the taste of the selected jelly bean. After tasting, participants rated the satisfaction with their choice and the degree of regret they experienced. They were also asked to rate how likely it was that there was an even better jelly bean on the table that they did not taste and how good the whole assortment of jelly beans that they saw would taste overall. All ratings were made on a Likert scale ranging from 1 (not at all) to 9 (very much). As an additional measure of attractiveness beyond the satisfaction rating of the taste, participants were asked what they would be willing to pay for “a box of 50 jelly beans like the ones you just saw on the table” (in euros).

After completion of the study, each participant received a coupon that could be exchanged for a small box of jelly beans at a secretary’s office that was three floors up and in another wing of the building. Making the effort to redeem the coupon was taken as a proxy to measure motivation. The study was conducted by a skilled experimenter who was unaware of my too-much-choice hypothesis. The experiment took place subsequent to another, unrelated study and participants were paid for the whole time they spent in the lab.



In total, 66 people participated in the study (33 in each condition; 34 women, 32 men evenly split over conditions). Most were students at a local university, and no one was on a diet. The average age of participants in both conditions was 25 years (SD=3.6 years). Of the 66 participants, 23 had never heard of jelly beans prior to the study and of the remaining 43 participants, 24 had never eaten one before. None of the participants ate jelly beans on a regular basis; 19 ate them occasionally. None of the subsequent analyses yielded considerable or statistically significant differences between participants who had eaten jelly beans before and those who had not.

Manipulation check


When asked to rate the perceived size of the assortment in front of them, participants in the large choice condition on average perceived the assortment as larger than those in the small choice condition (5.6 in the large condition vs. 4.2 in the small condition, t(64)=3.14; p=.003. However, given that a 5 denotes the middle of the scale, neither assortment was perceived as being extreme. In comparison to the similar manipulation check in the chocolate study by Iyengar and Lepper based on a 7-point Likert scale ranging from “too few” (1) to “too many” (7), the mean value in the large condition was 4.9 as compared to 3.6 in the small condition. While an average of 4.9 on a 7-point scale is also not very extreme, a comparison with my data based on standardized z-values (SD=1 and mean=0) shows that the large assortment in Iyengar and Lepper’s study was perceived as slightly larger than in my study (z=.7 vs. z=.3).

Choice process

Choosing from the large set of jelly beans was perceived as more difficult (6.3 in the large set vs. 3.5 in the small set), t(64)=5.32; p<.001; more frustrating (2.7 vs. 1.5), t(64)=3.45; p=.001; but also as more enjoyable (6.3 vs. 4.8), t(64)=2.78; p=.007, which matches the results reported by Iyengar and Lepper for choices between chocolates outlined in Chapter I.

Enjoyment, Satisfaction, and Regret

Participants who chose from a large assortment anticipated a slightly higher satisfaction with their chosen bean than participants who chose from the small set. With an alpha set at 0.05, the difference is not statistically significant, though (6.6 in the large set vs. 6.0 in the small set), t(64)=1.27; p=.210. If anything, the slightly higher expectations in the case of the large assortment should make it more likely to find a too-much-choice effect because it increases the chances that the actual experience will fall short of these expectations. However, my data does not show this. Contrary to the predictions of choice overload and in difference to the findings reported by Iyengar and Lepper, participants in the large choice condition did not differ significantly in their actual satisfaction with their chosen jelly bean. If anything, they were slightly more satisfied than participants in the small choice condition (6.7 vs. 6.2), t(64)=0.91; p=.366 (see Figure 4). Participants in the large condition also experienced less regret (1.9 vs. 2.3 in the small condition), t(64)=.897; p=.37. This is despite the fact that participants in the large choice condition held a stronger belief that there were better options available that they did not choose (5.3 vs. 4.1), t(64)=1.94; p=.056. Also, in the large choice condition, participants evaluated the whole assortment as better tasting overall (5.6 vs. 4.6), t(64)=2.09 p=.04.

Motivation to redeem a coupon and willingness to pay


Participants were willing to pay almost the same amount for a small box of jelly beans in the two conditions (1.70 euros in the large set vs. 1.60 euros in the small set), t(64)=1.2; p=.65. The same holds for the number of redeemed coupons. In the small choice condition, 21 coupons were redeemed while in the large choice condition 26 participants redeemed their coupon in the secretary’s office. t(64)=−1.1; p=0.28; Cohen’s d =−0.27). Figure 4 gives an overview of the main results.

Figure 4: Effect of assortment size on satisfaction and percentage of redeemed coupons


Despite the fact that my controlled laboratory experiment closely resembles Iyengar and Lepper’s study, participants in the large choice condition were as motivated to redeem a coupon and as satisfied with their chosen option as the participants in the small choice condition.


Iyengar and Lepper’s main dependent variables, satisfaction with the choice and probability of taking a box of candy as compensation, had an effect size of d=1.0 and d=0.88, respectively. With these effect sizes, the power to reject a false null hypothesis in the present experiment (N=66, alpha[one-sided]=0.05) was 0.82 and 0.95, respectively, which implies that the probability of obtaining a significant result was high. The fact that nevertheless I did not find the effect suggests that either the actual effect size is much smaller or that there were other variables that either diminished the effect in my study or boosted it in Iyengar and Lepper’s experiment.

Insofar as trade-off aversion drives the effect of too much choice, Dhar (1997) argued that for trivial and repeated decisions, an increase in choice omission due to too many options might rarely be found because individuals might simply choose more than one option or they may choose something else at the next occasion. Thus, maybe more than a jelly bean needs to be at stake in order for choice overload to loom. On the other hand, choosing a chocolate praline is hardly consequential and therefore this explanation does not resolve the difference between the two studies.

Prior preferences

Given that there were no considerable differences between participants who had eaten jelly beans before and those who had not, it seems unlikely that prior preferences can explain why I did not find an effect of too much choice. As I laid out in Chapter I, the existence of prior preferences might explain why one would not find a too-much-choice effect but it is not obvious why the lack thereof should lead to the effect.

Distinction between subjective and behavioral measures


While the results of the jelly bean study do not support the idea of a too-much-choice effect, they point out the importance of having a clear distinction between subjective and behavioral measurements. Based on subjectively perceived difficulty and frustration, one could argue for a too-much-choice effect. However, these emotions did not translate into manifest behavior. The fact that an increasing number of options simultaneously led to more frustration and to more joy makes it difficult to interpret these self-reports on emotional states as dependent measures.

General discussion 

In their original experiments, which I strove to replicate, Iyengar and Lepper (2000) found strong effects of assortment size on the motivation to redeem a coupon and also on the satisfaction with the chosen option. In the face of moderate procedural variation within the three studies that I conducted, the effect did not prove robust. On a general theoretical level, there are at two different explanations for the differences between the results: First, it could be that the effect of too much choice was actually much smaller than the effects found in previous studies outlined in chapter I and that the different results in my studies as compared to those that found the effect are solely due to unsystematic sampling or random error. Second, it could be that there are systematic differences between the studies that are responsible for the diverging results. In the first case, a meta-analytic integration of the studies would yield a more reliable estimate of the real magnitude of the effect. In the latter case, there should be a systematic and theory-driven search for potential boundary conditions and systematic differences between the studies that have been overlooked so far.

Random variation or moderator variables?

To find out which of the two interpretations is more plausible, one needs to know how likely it is that the differences between the effect sizes are due to mere sampling error. If the differences between the studies are simply due to random variation around a true population effect size, there is nothing left for moderator variables to explain (Hunter & Schmidt, 1990). To statistically test the homogeneity of the effect sizes, one needs to relate the variance between the studies to the error variance within the studies in a Q-test (Cochran, 1954). The Q-test can be calculated as



with d i being the effect size of study i, m being the total number of studies, and w i being a weight that is calculated as the inverse of the standard error of d (Shadish & Haddock, 1994):



with n 1 being the sample size of the group that chose from the small assortment and n 2 being the sample size of the group that chose from the large assortment.

For the data on hand, the Q-value obtained from Equation 2-1 is 38. The Q-value follows a chi-square distribution with m-1 degrees of freedom. With m=5 and an alpha-value set at 0.05, the critical Q-value is 9.5. As this is smaller that the obtained Q-value of 38, it can be concluded that the distribution of effect sizes is not homogenous and thus the differences between the studies cannot be explained by mere sampling error or random variance. As a consequence, according to Hedges and Olkin (1985), the further exploration of potential moderator variables seems worthwhile. In the next chapter, I will lay out a series of experiments in which I strived to systematically identify some of the most promising moderator variables.

© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML-Version erstellt am: