3 Method


Chapters 1 and 2 presented empirical findings and conceptual arguments that provided a theoretical background regarding the MU process and how it is influenced by intelligence and dispositional valuations. These theoretical chapters culminated in a number of hypotheses. In the following chapter, the characteristics of the samples (3.1), procedures (3.2), and measures (3.3) that were used to test these hypotheses are discussed.

3.1  Sample Characteristics

3.1.1  Rationale for Sample Selection

To address the study’s research questions, data from four samples of participants were collected. The first group consisted of intellectually gifted individuals who are members of Mensa, an organization for the gifted. The second sample consisted of high achieving and average achieving alumni from two major Berlin universities. The third group consisted of Internet users. Finally, the fourth group consisted of university students who took part in a lab experiment. The choice of samples and the accompanying study designs was guided by a number of considerations that are described briefly below.

Samples 1.I and 2 were included to address the main and dyadic effect hypotheses using self-ratings of intelligence and ego-centered social networks in samples that differ in their cognitive level and the importance they attach to this domain. Mensa members have been tested with an intelligence test, so they are aware of their giftedness status and apparently attach a high importance to this fact. In contrast, the highly achieving alumni sample may also be regarded as gifted in terms of their academic achievements, but these individuals do not necessarily regard themselves as such. Finally, the averagely achieving alumni can be hypothesized to be more diverse in terms of their intelligence and the importance attached to this trait.


To avoid sole reliance on self-reports, Samples 1.II and 3 were added. In Sample 1.II, Mensa members’ reports were complemented with data from some of their network partners in order to identify possible bias in social judgment. This allowed a more stringent test of main and dyadic effects. Participants in Sample 3 took a psychometric vocabulary tests to investigate the main effects of crystallized intelligence on social network characteristics without relying on self-ratings.

The data collected in Sample 4 served to study the impact of intelligence and dispositional valuations on MU in the context of interactions between strangers in a more controlled, experimental setting. This allows an examination of the effects of personality differences on social relationships that are not self-selected. Moreover, the intelligence and dispositional valuations of each member of the dyad were measured independently with psychometric instruments as well as rated by both persons, which allowed for the disentanglement of the effects of „true” vs. „ego-centric” similarity on MU. Finally, the effect of similarity expectations was tested by an experimental manipulation that informed subjects that they were either very similar or very dissimilar to each other (compared to a control group receiving no information). In the following, descriptive information regarding the composition of each of the samples is provided.

3.1.2 Sample 1

With over 100,000 members worldwide, Mensa is the largest international organization for gifted individuals. As the sole criterion, members need to surpass the 98th percentile of the intelligence distribution (i.e., have an IQ higher than 130), of which proof in the form of an official intelligence test is required upon admission. In Germany, Mensa has about 4,500 members (Mind-Magazin, Nr. 39). Against a statistically possible member pool of well over 1 million Germans (using the 98th percentile criterion), powerful self-selection mechanisms can be expected.


Participants were recruited in two ways. First, a popular scientific article on the subject matter of the current study was written for the magazine „Mind”, the bimonthly periodical for Mensa Germany. Although every member receives the magazine, it is not known how many actually read it. Included in the article was a call for participation in the current study. Participants could either order a P&P version of the questionnaire or fill out an online version. To ensure the giftedness status of the participants, they were required to provide their Mensa membership number upon participation. This recruitment method resulted in a total of 273 Mensa members who completed either the paper & pencil (P&P) or the Internet version of the questionnaire. Eleven of these cases were excluded because they were either younger than 18 or older than 120. In addition, 24 persons were excluded who did not provide personality information AND at least four network persons, five persons because they did not provide any social network information, and one person whose questionnaire responses lacked meaningful variance. This procedure resulted in a total of 232 cases with usable data. These participants will be referred to as Sample 1.I.

Second, a call for participation were sent to the electronic mailing lists of the Mensa branches in Berlin, Hamburg, and to the division of Mensa members aged around thirty (U3SIG). This fully electronic procedure resulted in a total of 472 individuals accessing the emailed URL link. Of these, 39 individuals stated they were younger than 18 or older than 120, 332 individuals failed to provide personality information AND at least four network persons, and two individuals did not show any variation in the ratings of their network partners’ intelligence. Accordingly, a total of 99 useable participants were collected (45 females, average age 30.9). Because the study procedure for these participants was different (in terms of the questionnaire that was used and the availability of dyadic information), they will be referred to as Sample 1.II.

Table 3 summarizes the characteristics of the total sample MENSA sample. Of 331 cases, 270 (82%) had used the Internet and 61 the P&P version of the questionnaire. The mean age in this group was 34.0 (SD 9.0), which corresponds closely to the mean age of all Mensa members (Mind-Magazin, Nr. 39). With 51% of the participants being female, the sample was fairly gender-balanced. However, because the MENSA member pool has more men than women (2:1; Alain May, personal communication, 2004), this implies a selection bias favoring women participation. The 273 Mensa members of Sample 1.I (who reacted to the call for participation in the Mensa magazine) provided data regarding their level of education. Of these, 14.5% reported the German gymnasium (college) as their highest education level, for 23.2% an apprenticeship (Ausbildung), 47.7% some form of university degree, and 8.7% a PhD. This represents quite a broad range of educational status for an intellectually gifted sample.


Table 3 Description of Demographic Information and Study Procedure Across Samples

Sample 1:

MENSA members

Sample 2:

Sample 3:

Internet users

Sample 4:

University students / alumni



Assessment period









34.0 (9.0)




% female




Dyadic data








Note. HA = High achievers, AA = Average achievers, SR = Self-ratings, TEST = Psychometric test
a 174 partner-reports for a subsample of n = 40.
b Self-reported IQ test results for a subsample of n = 76.

3.1.3 Sample 2 High Achieving (HA) Alumni

With the permission of the university’s privacy commissioner („Datenschutzbeauftragter“), the examinations office of the Humboldt University Berlin was asked to provide the addresses of the university graduates whose final grade belonged to the top 14% of their peers within the same faculty. This criterion includes all participants who were at least one SD higher in achievement than their peers; although arbitrary, this cutoff secured an adequate samples size without being overinclusive.

Data collection was carried out in three waves between March 2003 and June 2003, when 322 former students of the Humboldt University Berlin and the Free University Berlin who had received their degrees during the past year were contacted. Because many alumni move to another city upon their graduation, it was expected that not all questionnaires would reach their target. In the first16 wave, 13% of all questionnaires could not be delivered to the address and were returned to sender. Because some additional questionnaires may have successfully arrived at the specified address but the targeted person no longer lived there, the percentage of unsuccessfully contacted alumni was estimated at 15%. Because of the anonymous nature of the participant recruitment, it was not possible to send reminder letters, which might have raised the response rate.


Between March 2003 and February 2004, the estimated 274 successfully delivered participation letters (85% of 322) sent to the high achieving graduates resulted in 174 filled-out P&P or Internet questionnaires. By applying the same exclusion criteria as in Sample 1, 20 respondents were excluded, resulting in a total number of 154 participants. This corresponds to an enrollment rate of 56%, which is quite high given the absence of financial reward and the fact that the questionnaire took about 60 minutes to fill out. Moreover, the gender composition of the sample was quite balanced, with 56% females. The mean age was 27.1 years (SD 2.3). Average Achieving (AA) Alumni

Parallel to contacting the top 16% achievers, university graduates who fell between the 42nd and 58th percentile in terms of academic achievement (average 16%) were contacted. Across all three waves, 317 former students were contacted. Against an estimated 269 effectively delivered questionnaires, a total of 89 people responded by contributing at least some data. Applying the criteria described above resulted in 18 excluded cases, bringing the total number of participants in this group to 70. The low response rate of 26% suggests that selection biases were strong in this sample. This was reflected in the more biased gender ratio (65% women), though the average age (27.8, SD 3.4) was very similar to that of the highly achieving group.

Some differences between the highly achieving and the average achieving alumni’s may be due to differences in the invitation letter they received. In the highly achieving pool, subjects were praised for their high achievement and told that their responses were particularly interesting to study the relation between giftedness and social relationships. In contrast, the average achieving subjects were told that the study was directed at gifted individuals but that academic achievement is not only determined by intellectual ability, so people with varying achievement levels were contacted. In hindsight, targeted individuals may have suspected that they had been assigned to the control group of less achieving subjects, which could have discouraged participation.

3.1.4 Sample 3


Because of the biased nature of Samples 1 and 2, an additional sample of Internet users was recruited. For this purpose, a description of the study together with a call for participation was published in a number of German-language sites dedicated to online psychological research, including the PSYTESTS portal of the Humboldt University Berlin (http://www.psytests.de). As an incentive for participation, feedback regarding the Big Five and the level of verbal intelligence was offered after completion of the questionnaire.

A total number of 845 individuals aged 18 or older accessed the questionnaire site. As is usual in online research, some of these individuals only „glanced through” without providing any information. In the present case, a total of 301 individuals failed to fill out either the Big Five or verbal intelligence test (80 subjects in this subgroup also failed to provide information on at least four social relationships and 1 subject failed to provide gender information). Moreover, 16 individuals did not show any variability in the rated intelligence of their network partners. Thus, the final sample comprised 528 participants. Of these individuals, 78% were female and the average age of the sample was 27.9 (SD 8.3).

3.1.5 Sample 4

The fourth sample consisted of university students or alumni who reacted to an article in the Humboldt University newspaper about the impact of personality on interpersonal communication or to flyers that were widely distributed in places frequented by Berlin students (university buildings, university restaurants, university bus stops, etc.) As an incentive to participate, a sum of 15 Euro (around $20) and a personal feedback profile were offered. The topic of the study was described as „interpersonal communication”.


Participants were required to apply for participation via an online questionnaire. A total of 433 people visited the corresponding website that also included a more detailed description of the study. Of these potentially interested individuals, a total of 200 proceeded with filling out the pre-test questionnaire and left an email address or phone number where they could be contacted. To ensure a balanced sex ratio, the questionnaire for each gender was closed after 100 male/female participants took part (this took longer for the male participants).

During the course of the study, 144 participants actually visited the laboratory and completed the full experiment. Of the remaining 56 individuals, 6 could not be contacted, 34 cancelled their participation or did not show up for the scheduled meeting, 8 came to the lab but could not be videotaped because their scheduled partner did not show up, and 8 were excluded because they were not German native speakers (this was a declared requirement for participation to avoid confounding of language skills and communication quality).

The remaining individuals of the fourth sample consisted of 74 females and 70 males and the mean age was 24.1 (SD 3.9). The participants had diverse backgrounds in terms of their university major, with 30 (21%) students/graduates from a language or philosophy department (mostly German language, foreign language, and philosophy students), 38 (26%) from the social sciences (including 15 psychologists17), 13 (9%) from law or economics, 44 from the natural sciences (31%) and the remaining 13% from other disciplines (medicine, agriculture, engineering, and art).

3.2 Procedure

3.2.1  Studies 1.I, 2, and 3


Participants in Samples 1, 2, and 3 filled out the measures used in the present study at home without being monitored. To answer participants’ questions regarding the study’s procedure, a special email address and telephone hotline was created, but this option was only rarely used. Apparently, participants had no trouble completing the current questionnaire. The order of the presentation of the scales was fixed, starting with the social relationships questionnaire, followed by the intelligence, self-concept, and Big Five self-ratings. The only exception to this fixed order was Sample 3, where the psychometric vocabulary test replaced the self-concept scales.

3.2.2 Study 1.II

The 99 individuals from Sample 1 filled out a modified version of the social network instrument described in Section 3.3.1. After compiling a list of their network partners, participants had the option to invite each partner to take part in the current study. This option was chosen by 71 participants for at least one network partner. In total, 607 relationship partners were contacted by the Psychological Institute via regular mail or email, or directly received a printed questionnaire from the participants themselves.

The questionnaire for the contact persons was available in both an online and P&P version and required the contact persons to fill out the NEO-FFI questionnaire, rate the person that had invited them to participate in the study (i.e., the Mensa member) in terms of intelligence, and assess the quality of the relationship with that person (α = .75). A total number of 174 contact persons (94 females; response rate 29%) of 40 Mensa members contributed data, of which 51 used the P&P18 and 123 the Internet version. The mean age of these individuals was 32.9 years (SD = 12.4).

3.2.3 Study 4


At the end of the pretest questionnaire in Study 4, participants were invited to visit the Psychological Institute in Berlin Adlershof where they would discuss up to three personally important life domains19 with another person. They were told that this conversation would be videotaped.  Creation of Dyads

To create dyads with a maximum variance in psychometric intelligence levels, an experimental manipulation was performed. For this purpose, a composite intelligence score was calculated from the results of the available pretest information. As can be seen in Table 4, the vocabulary test was characterized by a marked range restriction (i.e., the SD was much smaller than the typical 15 points), which prohibited the usual procedure to z-standardize results. Instead, both the numerical and the vocabulary results were scored according to available norm data. For the numerical test, comparison data from a sample of 279 Gymnasium students were used, whereas the vocabulary test was normed according to data from 159 representative adults from the German ALLEE study (aging and life experience; Lang, Lüdtke, & Asendorpf, 2001).

As can be seen in Table 4, the norming procedure resulted in somewhat different mean IQ scores, with participants scoring higher on the numerical than on the vocabulary test. In light of the somewhat younger comparison group for the numerical test and the high proportion of participants from the natural sciences, this was not unexpected. The mean vocabulary IQ was about 10 points lower than the numerical IQ and the SD was markedly different from the 15 points that would be expected in a perfectly representative sample. Nevertheless, the composite mean value of 115.7 seemed a reasonable estimate of the average IQ of university students.


Unexpectedly, the numerical test was only weakly correlated with the vocabulary test (r = .13, p = .07). Because reliabilities for these tests were adequate, this does not seem to result from a psychometric artifact. Also, a method bias resulting from the Internet testing does not seem likely, since this would have created an artificial but systematic source of variance resulting in higher correlations. First, the truncated the size of the correlation could have been due to a restriction of range. Second the lack of association between the two tests may be due to an advanced level of cognitive specialization (Lubinski, Webb, Morelock, & Benbow, 2001) in Sample 4. For example, the average numerical IQ of mathematics students was 123.8 (SD = 12.3) vs. 117.6 (SD = 12.6) for language/arts students, which is a significant difference, F(2, 74) = 4.42, p = .04. In contrast, language students had a higher vocabulary IQ than mathematics students (M = 114.7 vs. 108.6, SD = 8.5 vs. 8.3, respectively), F(2, 74) = 9.52, p = .01.

After a sufficient number of participants (n ≥ 20) had completed the pretest questionnaire, same-sex dyads of minimum and maximum IQ differences were created alternately. When a participant was assigned to the minimum difference group, a partner was selected whose IQ differed less than two IQ points. In the maximum difference group, a partner with more than 10 points difference was selected. This resulted in 38 similar and 34 dissimilar dyads, with a mean IQ difference of .6 and 18.1 IQ points, respectively (p < .01). It should be noted, however, that the experimental manipulation assumes a general intelligence factor. Because of the lack of significant association between numerical intelligence and vocabulary, effects of the manipulation cannot be interpreted in this way.

Table 4 Normed Results and Psychometric Properties of Intelligence Tests Used in Sample 4


Norm score (SD)



Numerical test

121.3 (14.3)

279 Gymnasium students (average age 17.7 years) a


Figural test (IST)

104.3 (8.8)

415 German young adults (age 26-30 years) b


Vocabulary test (MWT)

110.0 (8.8)

159 German young adults (age 20-40 years) c


a See Wilhelm (2000)
b See Amthauer et al. (2001)
c Norm data provided by F. R. Lang (see also Lang et al. 2001) Experimental Setting and Instructions


After both partners had completed the figural test, they were brought together into a comfortably furnished experimental room. In the middle of the room, a round table with two glasses and a water carafe was placed (see Appendix 7.1). The interaction evaluation questionnaires were also placed on this table. Each participant was randomly assigned to a seat that was fixed to a place to the left or right of the table. At the other side of the room, an unobtrusive camera was attached to the wall and filmed the interaction that followed.

After the participants were seated, a second manipulation was performed. In approximately half of the dyads (39), participants were given feedback regarding the relative similarity of their IQ results. In 19 dyads of the similarity condition, this feedback informed them that both participants had achieved highly comparable IQ scores. In 20 dissimilar dyads, the feedback stated that IQ results had been very different. No information regarding a single individual’s absolute intelligence level was given.

After the feedback manipulation, the student assistant asked the participants if they had any further questions and then left the room to a separate video control station, where she monitored and videotaped the interaction. Via a microphone, she remained in contact with the participants and explained that the interaction format required that the person in the right chair (seen from the camera) was first in talking about his or her important life domains. The person in the left chair was instructed to interview his or her partner about the reasons why he or she regarded this life domain as personally important.


It was explicitly stressed that the purpose of this interview was not for the interviewee to „justify” his or her life domain. Rather, the interviewer was to achieve an understanding of the subjective role of the domain in the life of the interaction partner.20 For this purpose, questions like „Why is this life domain so important to you?” or „What do you associate with this life domain?” were suggested. The interviewer was instructed to open the interview with the standard question „About which life domain do you want to talk first?“ Participants were informed that they could talk about up to three life domains in each interaction half.

After the instruction, the participant in the left chair started the interaction by playing the role of the active interviewer whose task it was to explore the subjective meaning of the other person’s life domain. This interaction half was interrupted by the student assistant after 10 minutes to instruct the participants that they were to pick up the questionnaire on the table and fill out the section regarding the first interaction half (the student assistant kept monitoring the participants during this time). When the participants were ready, they proceeded to the second interaction half in which the roles of interviewer and interviewee were reversed. After this, the student assistant instructed the participants to fill out the second part of the questionnaire. When they were ready doing this, participants were thanked, debriefed, and paid.

3.3 Measures

In the following, the instruments that were used to assess MU, intelligence, dispositional valuations, and control variables are described. These instruments include both psychometric instruments (summarized in Table 4) and self-report measures (summarized in Table 5).


Table 5 Overview of Self-Report Scales Across Samples

Self-Report Instrument

Sample 1

Sample 2

Sample 3

Sample 4





Self-concept of peer relationships (SDQ-III)



Self-concept of intelligence







Big Five (NEO-FFI)




Big Five (BFI)


Values (RVS)


Interests (AIST)


Loneliness (UCLA Loneliness Scale)



3.3.1  Mutual Understanding and Related Constructs Social Network Characteristics (Sample 1-3)

Social network characteristics were sampled with a measure taken form Asendorpf and Wilpers (1998) and Neyer (1997) (see Appendix 7.2). In a first step, this measure requires participants to list all personally meaningful persons with whom they interact at least once per month. Contact persons were sampled from a wide range of family and non-family categories. Additionally, data on age, sex, duration of the relationship (between 1 = less than one year, and 4 = more than five years), and contact frequency (between 1 = once per month or less, and 5 = daily) were collected.

Table 6 summarizes some average features of the social networks of the different samples. As can be seen, Mensa participants reported an average of 16.4 network partners, whereas the university graduates reported an average of 22 to 23 partners. A similar difference was found for the category of friends, with Mensa members reporting less friends in their social network (4.8) compared to the alumni sample (between 9.1 and 10.3). Most participants in Samples 1 and 2 mentioned their mothers and fathers as members of their social network, whereas at least half also mentioned a romantic partner. On average, the social networks people reported were quite diverse in terms of demographic variables, with a reasonably balanced age distribution and gender ratio (close to 50% in all samples).


The social network data can be compared with data from Neyer (1999), who used an almost identical (P&P) instrument as used in the present study in a sample of N = 495 representative German adults (aged between 17.1 and 29.8 years, M = 24.3, SD = 3.7). In this study, participants reported an average number of 17.9 (SD = 8.5) network partners, with 51% females. On a 3-point Likert scale, they also reported the age of their network partner and the frequency of contact. The average age score was 2.2 (i.e., close to 2 = „about the same age”), and the average contact frequency category was 2.6 (i.e., between 2 = „multiple times a month” and 3 = „once a week”). Finally, participants listed 0.9 mothers, 0.9 fathers, 0.8 partners, and 5.7 friends. As can be seen in Table 6, the network composition reported by Neyer (1999) best matches the social network of Sample 1 (Mensa members).

In a second step, participants were asked to rate every contact person according to the following dimensions (in order of appearance): importance (what impact would the termination of the relationship have: 1 = I would feel better, 5 = I would be strongly burdened for a long time; assessed in Sample 2 and Sample 1.I), felt closeness of the relationship (1 = very distant, 5 = very close), frequency of conflict (1 = never, 5 = almost always), opportunity for meaningful communication (about themes that are important to you: 1 not at all, 5 = very good), availability of emotional support (1 = never, 5 = for almost every problem), felt understanding (1 = very much misunderstood, 5 = very much understood), and felt acceptance (assessed in Sample 3 and Sample 1.II) (to what degree do you feel accepted by this person: 1 = not at all accepted, 5 = completely accepted).

Table 6 Social Network Composition Across Samples

Sample 1:

MENSA members


Sample 3:

Internet users

F a




1.0 (0.3)

0.9 (0.4)



0.6 (0.5)

1.0 (0.3)

0.9 (0.4)

0.6 (0.5)



4.4 (4.2)

10.6 (6.0)

9.0 (5.4)

3.7 (3.3)



0.5 (0.5)

0.7 (0.5)

0.7 (0.5)

0.6 (0.5)



14.2 (9.5)

22.8 (8.5)

21.8 (8.8)

10.1 (6.1)



39.7 (8.6)

35.2 (4.2)

36.8 (6.2)

36.0 (7.2)


gender (% female)

0.5 (0.2)

0.5 (0.1)

0.5 (0.1)

0.6 (0.2)


rated intelligence

15.9 (2.2)

14.6 (1.7)

14.5 (1.9)

13.3 (2.6)


contact duration

3.3 (0.5)

3.2 (0.5)

3.3 (0.4)

3.3 (0.4)


contact frequency

2.7 (0.7)

2.5 (0.5)

2.4 (0.5)

3.1 (0.7)


Note. k = average frequency of reporting a relationship category
a Univariate difference between samples, df (between) = 3, df (within) = 985-992
* p = .05 ** p = .01 Self-concept of Social Relationships With Peers (Sample 1-2)


To measure the participants’ self-assessments of the quality of their relationships with same-sex and opposite-sex peers, 8 items from the German translation of the Self Description Questionnaire III (SDQ-III; Marsh, 1992) were used (5-point Likert scale). These items were drawn from a study by Schwanzer (2002), who created short 4-item (half of them negatively framed) versions of the SDQ-III scales on the basis of item-total correlations (see Appendix 7.3). In the current study, both scales had good reliabilities (α = .81 for same-sex peers; .84 for opposite-sex peers). Loneliness (Samples 1-2)

Mixed with the SDQ-III items, 10 items from the German translation of the UCLA Loneliness Scale (Döring & Bortz, 1993) were included to measure subjective feelings of loneliness on a five-point Likert scale. Five of these items referred to social loneliness, whereas the remaining 5 concentrated more on emotional aspects of loneliness. Because these two facets correlated very highly (r = .71, p = .01), they were combined into a composite loneliness score (α = .83). Evaluation of Dyadic Communication

After each interaction, participants completed a short questionnaire (5-point Likert scale) assessing the level of felt understanding (4 items, e.g., „I [the interviewee] succeeded in explaining the interviewer [me] what personal meaning the discussed life domains have for me [him/her]”), empathic ability of the interviewer (4 items, e.g., „It was often difficult for me [the other person] to follow the thoughts of the interviewed person [my thoughts] with my [his/her] questions)”, interaction flow (4 items, e.g., „I did not enjoy the conversation”), and comfort (1 item, „I felt relaxed during the conversation”). Some of the items of this questionnaire were adapted from Hecht’s (1978) Communication Satisfaction Inventory, but others were especially constructed for the current dissertation (see Appendix 7.4).

3.3.2 Intelligence Intelligence Ratings (Samples 1-4)


Following the ratings of social relationship quality, participants were asked to rate their own intelligence (in Samples 1-2) and the intelligence of each contact person (Samples 1-4). For this, the unpublished „Intellectual Ability Questionnaire” developed by O. Wilhelm was used (2000; see Bailey & Lazar, 1976, for a similar measure). Participants were first instructed about the intelligence distribution in the population with the help of a graphical normal curve (see Appendix 7.5). In a next step, participants were asked to rate the intelligence of every contact person as well as their own intelligence on a 1 (0-5%) to 20 (95-100%) percentile scale.

Self-ratings of intelligence have been shown to be moderately accurate in predicting psychometric intelligence. In several reviews of the relevant literature, it has been stated that the validity of self-ratings approximates .30 (Furnham, 2001; Paulhus, Lysy, & Yik, 1998). Appendix 7.6 lists some empirical studies that calculated the correlation between psychometrically measured and self-rated intelligence. As can be seen, these studies report an average correlation of .29, which is consistent with previous reviews. This value is also somewhat similar to, though slightly lower than, agreement between self-ratings and informed acquaintances (Borkenau & Liebler, 1993: r = .29; Paulhus & Morgan, 1997: r = .37; but see Bailey & Mettetal, 1977b).

Especially with regard to self-ratings of intelligence, the level of predictive validity has been regarded as disappointing (Paulhus et al., 1998). Indeed, a correlation of .3 between measured and rated intelligence is considered small (Cohen, 1992). However, it should be noted that most studies using college students as participants suffer from restriction of range in intelligence, which leads to reductions of predictive correlations.21 For example, Paulhus et al. (1998, p. 549) found a correlation of .22 between (single item) self-ratings and psychometric intelligence, but applying a correction formula22 increased the correlation to .30-.35. In addition, intelligence rating scales are often not very reliable. Again using Paulhus et al.’s (1998) data as an example (who reported an alpha of .43 for single item ratings and .81 for the psychometric test), correcting for attenuation23 resulted in an increase of the „true validity” of the single item ratings to levels above .50, which is more acceptable. Accordingly, the validity of single items self-ratings seems „strong enough to be useful in [nomothetic] research, if not in diagnosing individuals” (Paulhus et al., 1998, p. 549).


Because of the evidence for the (modest) validity of intelligence ratings, the current study used them as proxies for general intelligence (Studies 1-3). However, it needs to be taken in mind that intelligence ratings are biased by a number of sources. First of all, such ratings are prone to self-serving biases (Gabriel, Critelli, & Ee, 1994; Dunning & Cohen, 1992). Second, intelligence ratings have been shown to be confounded by stereotypical influences associated with number of factors, such as gender (Furnham, 2001; Rammstedt & Rammsayer, 2000), age (Furnham, 2001), and physical attractiveness (Zebrowitz, Hall, Murphy, & Rhodes, 2002). To adjust for some of these stereotypical influences, the current dissertation used the residuals of a regression analysis predicting intelligence ratings with age and gender. The degree of physical attractiveness was not measured in the current study and could thus not be corrected for.

In the current study, there were significant differences in self ratings between Samples 1 and 2, F(3, 520) = 239.12, p < .01. As expected, Mensa members (Sample 1) rated themselves as very high in intelligence (M = 19.7, SD = 0.8), followed by the highly achieving (M = 16.3, SD = 2.1) and average achieving (M = 15.7, SD = 2.3) university alumni (Sample 2). The university students comprising Sample 4 had the lowest self ratings (M = 14.2, SD = 2.3). Average ratings by all samples of university students/alumni were higher than the scale midpoint, which is not unexpected given their high educational status. As can be seen in Table 6, network partners were also rated as above-average in intelligence, with the Mensa members reporting the most intelligent partners (M = 15.7, SD = 1.9), followed by the highly achieving and averagely achieving university alumni (M = 14.6, SD = 1.7 and M = 14.5, SD = 1.8, respectively) and the Internet users (Sample 3) coming last (M = 13.2, SD = 2.4). This difference is significant, F(3, 949) = 73.69, p < .01. Self-Concept of Intelligence (Samples 1-2)

In Samples 1 and 2, the self-concept of intelligence was assessed alongside the SDQ-III scales (see Appendix 7.3) with four specifically devised items (half of them negatively formulated) that were calibrated towards a high intelligence level to avoid ceiling effects in the gifted sample (e.g., „compared to others, my level of intellectual abilities is unusually high”). This scale had a 1-5 Likert format and very good internal consistency (α = .87). Psychometrically Tested Numerical Intelligence (Sample 4)


Numerical intelligence has been found to be a good estimate of fluid intelligence (Bickley et al., 1995), especially when tested in the context of new problems. Sample 4 took a test developed by O. Wilhelm (2000; see Appendix 7.7). This test requires participants to complete 17 series of 9 numbers. To find the solution, it is necessary to discover regularities in the first 7 numbers and then to apply this rule to the two empty slots. Item difficulties ranged from .94 (Item 4) to .53 (Item 12), with an average of .77 (average SD = .39). For the 144 participants who also completed the laboratory phase, alpha reliability was .77 (item total rs between .09 and .60), which is comparable to the .73 reported by Wilhelm (2000). The test loaded very highly on a general intelligence factor in a battery of 12 tests (Wilhelm, 2000). Psychometrically Tested Figural Intelligence (Sample 4)

As a test of figural intelligence, the Matrices subtest of the Intelligence Structure Test [Intelligenz-Struktur-Test] 2001-R (IST-2001; Amthauer et al. 2001) was used. This test is very much akin to the Raven’s Progressive Matrices test (see figure 4) that is considered as one of the best markers of fluid intelligence. The IST-2001 Matrices test consists of 18 series of 3 figures and 2 series of 8 figures that are built up according to some rule. Out of 4 alternatives, participants need to choose the figure that would complete the series. Following the manual, a time limit of 10 minutes was set for the test, which allowed participants to skip over certain items and proceed to the next one.

On average, participants answered 14.1 items, with difficulties for answered items ranging from .95 (item 1) to .16 (item 20), paralleling the values reported in the test manual. The reliability of this speeded test was calculated by correlating (using Spearman’s Rho) the number of correct odd items with the number of correct even items and correcting this index with the Spearman Brown formula. This resulted in an estimated reliability of .65, which comes close to the .70 reported in the manual for students of the Gymnasium. Applying the .70 reliability criterion that is acceptable according to Nunnaly (1978), this value is somewhat low, even for a short test of 20 items. Accordingly, results based on this test should be interpreted with some caution. Psychometrically Tested Vocabulary (Samples 3-4)


In Samples 3 and 4, vocabulary was measured with the Multiple-Choice Vocabulary Test [Mehrfachwahl Wortschatztest] (MWT; Lehrl, 1995). This test consists of 35 sets of five alternative letter combinations, only one of which is a correctly spelled word. In the manual, Lehrl summarizes the results from 26 studies that report a median correlation coefficient of .71 with several global intelligence tests. In both samples, a large number of items (12 in Sample 3, 13 in Sample 4) were answered correctly by almost all participants (difficulty ≥ .90; Items 1, 3, 4, 6-8, 10, 11, 14-16, 22, 26), with an average difficulty of around .70 (SD .11). The number of correctly answered items was used as the total score (α = .70 in both samples). Self-Reported IQ-Test Results (Sample 1)

In Sample 1, a total number of 76 Mensa members (32%) provided the result of their latest IQ test results.24 Although the accuracy of this information was not checked, the voluntary nature of participation in the current study makes the possibility of fake answering less likely. As expected, the mean intelligence level was very high and severely restricted (M = 135.6, SD = 4.4). An arbitrary cutoff of 135 (median score) was used to create a „moderately gifted” (mean IQ = 132.5, SD = 1.5, n = 42) and an „extremely gifted” group (mean IQ = 139.4, SD = 3.6, n = 34).

3.3.3 Dispositional Valuations Interests

Interests were measured with a short version of the German General Interests Structure Test [Allgemeiner Interessen-Struktur-Test] (AIST; Bergmann & Eder, 1992), with scales corresponding to Holland’s (1959) six basic interests (see Section 2.1.3). The unpublished short version (see Appendix 7.8) included 18 items and was developed by G. Nagy from the Berlin Max Planck Institute for Human Development on the basis of a factor analysis of the original 60 items (personal communication, 2 February, 2005). The items of the short form were selected on the basis of their discriminant factor loadings (i.e., high loadings on one factor, small loadings on all other factors), which makes them better suited for the calculation of profile similarity,25 as was done in the current study. Alpha reliabilities were mostly good (Realistic interests .81; Artistic interests .81; Social interests .80; Enterprising interests .82), except in two cases (Investigative interests .60; Conventional interests .67). Values


Values were assessed with a German version of the Rokeach Values Survey (RVS) adapted by Todt (1989). Participants were required to rank 17 end goals in terms of their subjective value. In the current study, the three most listed values were health, friendship, and love (rank M = 5.6, SD = 3.9 for health; M = 5.2, SD = 3.5 for friendship; M = 4.8, SD = 3.9 for love). The least important value in this sample was material wealth (rank M = 13.0, SD = 4.6). The median correlation between the different values was -.0826, with correlations ranging between -.46 (between humanity and leisure time) and .36 (between children and family life). Because of the singe-item nature of the RVS, reliability coefficients could not be calculated. The forced (relative) independence of the RVS items provides a good basis to calculate profile correlations in order to assess the similarity in values between two persons. Openness to Experience

In Samples 1, 2 and 4, openness to experience was assessed with the German version of the NEO-Five Factor Inventory (NEO-FFF; Borkenau & Ostendorf, 1993), using a 5-point Likert scale. Sample 3 completed the German version of the Big Five Inventory (BFI; Rammstedt, 1997). To ensure BFI scales of equal length (the original instrument has 7-10 items per scale), the seven highest-loading items according to a study by Lang et al. (2001) were selected for use in the current study. With one exception, all items of the Openness scale were are formulated in the positive direction (items of the other Big Five scales are phrased in both the negative and positive direction). As can be seen in Table 7, the NEO Openness scale had quite low reliability in both the Mensa group and the alumni sample (α ≤ .70).27 For the BFI scale, reliability was acceptable (α = .75).

The current sample’s scores on the NEO-FFI scales were compared to norm data of 1,908 representative German adults (community sample), collected by Körner, Geyer, and Brähler (2002).28 The BFI scales were compared to a representative sample of around 1,450 German adultscollected by Lang and Lüdtke (in preparation). As can be seen in Table 7, all tested samples were very high in openness, placing them on average in the 94th percentile29 of the population. One possible reason is that the current samples were characterized by high levels of intelligence and education, which are correlates of openness to experience (Ashton, Lee, Vernon, & Lang, 2000; Gignac, Stough, & Loukomitis, 2004). A second possibly is that the offer to provide personal feedback to participants attracted more psychologically-minded people high in openness. However, despite the extreme mean values, Samples 1, 2, and 4 did not seem to be restricted in range, as evidenced by the fact that the standardized SD was close to 1. Only in Sample 3 did the openness scale show a slight restriction in range.

3.3.4 Control Variables Big Five (Samples 1-4)


In all samples, extraversion, neuroticism, agreeableness, and conscientiousness (i.e., the remaining four Big Five factors) were assessed as control variables. As can be seen in Table 7, the reliability of these four scales was acceptable to good. Across all samples and factors, data were quite comparable with norm data, except for neuroticism in Sample 3, where the mean score trailed almost one SD below the population mean. On average, the studied samples were somewhat more extraverted, emotionally stable, and agreeable than the corresponding norm groups. The range in the four remaining traits was in no way restricted in range, with values tending instead towards somewhat higher diversity. Comparisons between Samples 1, 2, and 4 (i.e., the samples that completed the NEO-FFI) showed significant differences in extraversion, agreeableness, and conscientiousness, Fs (2, 710) > 17, ps < .01. Planned contrast showed that this was due to the lower extraversion and agreeableness of the Mensa sample compared the alumni and laboratory participants, Fs > 13, ps < .01, whereas the university alumni were more conscientious than the other two samples, Fs > 20, ps < .01.

Table 7 Psychometric Properties and Normed Scores of Big Five Scales Across Samples













Mean z a



















Mean z a



















Mean z b



















Mean z a












Aggregated mean z






Note. E = Extraversion, N = Neuroticism, O = Openness, C = Conscientiousness, A = Agreeableness
Norm data by Körner, Geyer, and Brähler (2002)
b Norm data by Lang and Lüdtke (in preparation)

3.3.5 Coding of MU From Behavioral Observations in Study 4

The videotaped material from Study 4 was used to assess aspects of both the individual participants’ personality and the dyadic interaction that unfolded between them. In general, two kinds of procedures can be used. First, it is possible to code distinct (molecular) behaviors that occur in an interaction. Such behaviors have the advantage that they are relatively unambiguous and easy to code. Second, it is possible to use impression ratings to assess interaction quality. Such a procedure allows judges to use all available information in an interaction (e.g., frequent smiling, touching) and integrate it into a composite rating (Cappella, 1997). Indeed, in judging rapport, this method has been recommended because of its cost-efficiency and accuracy (Bernieri & Rosenthal, 1991). For this reason, the latter method was used to assess the level of MU from the video observations.


Mutual understanding was coded by two student assistants, who were trained in the use of the coding system by the present author using 8 interaction halves as stimuli to ensure adequate reliability and internal validity. Both the training procedure and the eventual coding took place in a room with a video projector (with sound). The student assistants were equipped with computers into which they fed the ratings. The videotape was forwarded to the point where the interviewer speaks the opening sentence („About which life domain do you want to talk first“?), which served as the anchor for the start of the interaction. Every 30 seconds of the interaction, coders rated the amount of MU during the time frame that had elapsed.

The coders were instructed to rate their impression of the amount of understanding the interviewee would feel during the relevant 30-second. To achieve this, they were told to rely on two sources of information. First, they were to use the observable reactions of the interviewee. If the interviewee seemed comfortable while talking about his or her life domains, the relevant interval was rated as higher in understanding than if the interviewee was visibly strained and uncomfortable talking. Such behaviors could also consist of nonverbal behaviors, such as an „open” body posture, an interested face, etc. Second, they were to take the perspective of the interviewee into account and assess the amount of felt understanding they themselves would experience given the interviewer’s behavior during the interval.

Perceived understanding was rated on a 1 (extremely misunderstood) to 7 (extremely understood) Likert scale. Intervals in which no rating was possible (e.g., because of inadequate audio quality) were assigned a missing value. Because reliability across 30s intervals was adequate (α = .78 across coders), both judgments were combined into a composite index of perceived understanding. After each 10-minute interaction, coders discussed their ratings with each other to re-calibrate rating criteria if necessary (yet they did not change any ratings in retrospect). When the mean value of all intervals was aggregated across separate interaction halves, a highly reliable composite was achieved (α = .88).

3.4 Analysis Strategy


In addressing the hypotheses outlined above, multiple statistical techniques were used. First, because of the large number of variables that were assessed, factor analytical techniques were used to create composite scores (see Section 3.4.1). Second, many variables assessed in the current study were hierarchically related. To account for this „nested” structure, Hierarchical Linear Modeling (HLM; Bryk & Raudenbush, 1992) was used when appropriate (see Section 3.4.2). Third, Section 3.4.3 describes the calculation of difference scores and profile similarities to test dyadic effect hypotheses and discusses some of the statistical difficulties of this approach. Finally, Section 3.4.4 briefly touches upon the logic underlying the extreme group comparison.

3.4.1  Data Reduction

To reduce the number of independent and dependent variables, exploratory factor analyses (principal component analysis with Varimax rotation) were conducted to create composite scores whenever possible. Following conventional criteria (Eigenvalue = 1, scree plot inspection), it was tested whether the observed associations between variables can be summarized by one or more latent factors. Whenever the pattern of factor loadings was sufficiently clear-cut (i.e., high primary, low secondary factor loadings), factor scores were used in subsequent analyses.

3.4.2 Nested Structure of Data

A nested data structure occurs when observations are hierarchically organized, so that units on the lower-order level of the hierarchy can be characterized by their membership in some higher-order category (see Appendix 7.9, for a schematic depiction). An often-used example of such a structure comes from educational psychology, where pupils (Level 1) are nested in classes (Level 2) that are in turn nested in schools (Level 3). In the current study, social relationships assessed with the ego-centered network instrument are nested within participants. Specifically, they are organized according to two hierarchical levels: Level 1 consists of the network partners within an ego-centered network, whereas the Level 2 units are the individual participants („egos”).


According to a number of authors, HLM is ideally suited to deal with such hierarchical data because it accounts for interdependencies between different levels (Cooper, 2002; Gonzalez & Griffin, 2002; van Duijn, van Busschbach, & Snijders, 1999). To illustrate the necessity of a multilevel approach in analyzing the data from the current study, consider the following example, in which individuals differ in their calibration of the scale they use to rate people’s intelligence. Individuals who have calibrated their intelligence ratings around a mean scale level of 17 may rate themselves with a 18 (+1), a friend with 15 (-2), and a colleague with 19 (+2). By comparison, individuals who have calibrated their ratings around a mean scale level of 12 may use the same relative rating pattern but arrive at difference absolute values: self-rating = 13 (+1), rating of friend = 10 (-2), and rating of colleague = 14 (+2). If the differences in calibration are a result of error variance and not of differences in network partners’ „true” intelligence level, ignoring the nested structure of the network ratings would result in falsely treating the high intelligence colleague in the low-calibrated network matrix as less intelligent than the low intelligence friend in the high-calibrated network matrix.

HLM differentiates between multiple, hierarchical levels of the data. For every Level 2 unit, a separate Level 1 regression is estimated, which typically includes an intercept (β0), at least one coefficient that describes the association between an independent and dependent variable (β1), and an error term (random coefficient = r).30 For example, the relation between MU and intelligence for network partner i of participant j could be described as follows:

MUij = β0j + β1j*IQij + rij



where β0j is the average31 level of MU reported by person j, β1j is the average relation between IQ and MU across j’s social network, and rij is the difference between the corresponding observed and predicted values for network partner i.

As stated above, a crucial feature of HLM is that it uses the Level 1 β-coefficients as outcomes in an additional regression equation at Level 2 (see Appendix 7.8). In other words, the program allows the user to specify a separate Level 2 regression from which to predict the Level 1 coefficients. For example, it can be tested whether there is a difference between men and women in their social networks’ average level of MU or in the association between network partner IQ and MU. This would result in the following Level 2 equations:


β0j = γ00 + γ01*(GENDERj) + u0j


β1j = γ10 + γ11*(GENDERj) + u1j


where γ00 is the average β0j coefficient in the sample (i.e., the average level of MU across all participants), γ01 is the moderating relation between gender and the β0j intercept (this provides information regarding gender differences in the average level of MU), γ10 is the average association between IQ and MU, and γ11 is the moderating relation between gender and the IQ-MU association (for example, it may be that only men perceive a link between relationship quality and partner intelligence, whereas women do not).

Note that equations 2-3 include two Level-2 error terms: u0j for each individual participant’s residual variance in β0j, and u1j for the residual variance in β1j. When these error terms are significant, it means that there are individual differences in Level 2 parameters that are not explained by the variables in the regression model. In HLM terms, Level 1 coefficients that are allowed to vary across Level 2 units (independent of Level 2 covariates) are called random effects, whereas those who are the same for all Level 2 units are called fixed. In the above example the coefficient specifying the association between IQ and MU (β1j) is a random effect because the error term u1j allows this effect to differ across participants. In contrast, when the individual-specific error term would have been lacking, the effect of IQ on MU would be termed a fixed effect (even though it is allowed to vary according to gender).

3.4.3 Testing Dyadic Effects


Dyadic effects are dependent on indices of similarity. Difference scores are the most basic form of interpersonal similarity, but they have been criticized for combining the measurement error of both constitutive elements, rendering them less reliable (Burr & Nesselroade, 1990). It should be noted, however, that the bad reputation of difference scores mostly originates from their use in longitudinal research. As is shown by Burr and Nesselroade (199032), the reliability of difference scores is reduced when its components covary. In longitudinal research, it is the general rule that people’s personality scores at two points in time are substantially correlated. In dyadic research, however, the degree of association between two individuals’ personality scores is variable, ranging from complete interdependence to complete independence. In the latter case, difference scores are just as reliable as their constituting components (see Section 4.1.4).

The calculation of profile similarity indices has been recommended as an alternative to difference scores (Cronbach & Gleser, 1953). Such indices aggregate the differences between two or more variable pairs (e.g., by means of calculating the Euclidean distance between two persons in a multidimensional space of k variables). When similarity in more than one variable is the focus of analysis, the current study will use such indices (e.g., in the case of the k = 17 Rokeach values). When similarity is the independent variable, as in the current study, a disadvantage of using profile similarity indices is that they are unspecific with regard to the explanatory power of each of their constituting elements. For example, if a dyad that is very dissimilar in terms of the Five-Factor Model experiences a lower level of MU, it is not clear whether this is related to differences in 1) extraversion, 2) neuroticism, 3) openness, 4) conscientiousness, 5) agreeableness, 6) several, or 7) all of the above. This poses a problem for testing the hypotheses of the current study that postulate an effect of specific dyadic differences on the MU process. In such cases, the only viable option is to use difference scores.

A final note needs to be made regarding the calculation of intelligence differences between gifted individuals and their network partners in Sample 1. Because these individuals are located at the extreme high end of the intelligence distribution, the magnitude of interpersonal intelligence differences depends only on the intelligence of the interaction partner. Because for Mensa members, there is no effective difference between the (rated) absolute intelligence level and relative intelligence difference (r = -.94, p = .01), intelligence differences could not be calculated for this sample.

3.4.4 Groups Comparisons


One hypothesis of the current study was that gifted individuals experience less MU in their social relationships than control individuals. For this purpose, Sample 1 was compared to the highly and averagely achieving university alumni of Sample 2. It is clear that these different samples differ on a lot of dimensions other than just intelligence. As can be seen in Table 3, Mensa members were older on average, had a slightly lower educational level, and were more likely to use the Internet questionnaire (compared to the P&P version). To account for these differences, these factors were inserted as covariates in the corresponding analyses (mainly GLM).

Footnotes and Endnotes

16  For logistical reasons, the number of questionnaires mailed to invalid addresses could not be calculated for the other waves.

17  Because psychologists were thought more likely to have some background knowledge regarding the constructs that were studied, it was chosen to avoid reliance on their participation as much as possible. Therefore, no flyers were distributed in places that are primarily visited by psychology students.

18  Contact persons returned the questionnaires directly to the Psychological Institute, without further mediation by the primary participants.

19  Participants were instructed that these life domains did not need to be identical with the list of values of the Rokeach Value Survey.

20  It was also stressed that the interaction did not serve to draw conclusions about the participants’ level of ability. Rather, it was stressed that the study examined factors that play a role in human communication.

21  For example, Paulhus et al. (1998) used the Wonderlic intelligence test; instead of the typical SD of 7.1, they reported a SD of only 4.6.

22  rxy´ = rxy*{σ/SD} / √(1 - rxy 2 + rxy 2*{σ/SD}2, where rxy´ is the corrected correlation, rxy is the uncorrected correlation, σ is the unrestricted „true” standard deviation, and SD is the corresponding observed value.

23  rxy´ = rxy / √(rxx*ryy)

24  In most cases, this was the Mensa admission test. Note that the IQ tests were taken at different time points, which might have influenced results because secular gains in intelligence (Flynn, 2003) lead to an underestimation of average population performance when older test norms are applied.

25  Items loading on more than one factor reduce the variability of the different scales, resulting in relatively flat profiles.

26  The negative correlation was expected because of the forced ranking procedure (e.g., if a value gets a rank of one, other values will automatically get lower ranks).

27  Inspection of the matrix of item-total correlations uncovered that this was mainly due to the negative item-total correlation of Item 8: „I think we should pay more attention to the opinion of our religious authorities in making ethical decisions”. Contrary to the indented purpose, the open individuals in Samples 1-2 generally agreed with this item, perhaps because of its emphasis on complicated ethical problems.

28  The Körner et al. (2002) data were deemed superior to data reported in the German NEO-FFI handbook (Borkenau & Ostendorf, 1993) that rely on a sample that is biased towards younger adults.

29  Estimate based on an average z-score of 1.55.

30  In HLM, the error term is not simply the difference between the predicted and observed score. Rather, the program uses an Empirical Bayes (EB) estimation strategy that optimally integrates the coefficients gained from an „ordinary” Ordinary Least Square regression of Level 1 units and the values of these coefficients as predicted by the Level 2 equation. This is done while taking the Level 1 data quality into account. Specially, Level 2 units that contribute very little Level 1 data points are given less weight (shrinkage). Especially when the number of Level 1 units is small, this method produces superior results (Raudenbush, 1988).

31  The interpretation of this and all other parameters depends on the scaling of the raw data. Like any other regression approach, HLM estimates the beta coefficient for one variable while controlling for all other variables in the model. The intercept thus conforms to a situation where all other parameters are set to zero. If the variables in the model are centered, this approximates the “average case”. When the model includes dummy variables, the intercept corresponds to the average case of non-members of the dummy categories (e.g., when female gender is coded with 1 and male gender with 0, then the intercept corresponds to the average male).

32  r diff = ,
where r diff is the reliability of the difference score, s 1 and s 2 correspond to the SDs of the constituting variables, r 1 , r 2 correspond to the reliabilities of the constituting variables, and r 12 is the correlation between them.

© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML generated: