Spiekermann, Sarah: Online Information Search with Electronic Agents: Drivers, Impediments, and Privacy Issues

Chapter 6. Consumer Privacy Concerns in Interacting with Agents

6.1 Introduction to Privacy Issues in Online Interactions

A number of researchers in agent technology have pointed at the privacy issue as a central factor for agent acceptance by users [Shearin, 2000; West et al., 2000, Norman, 1994]. Norman, for example, stated: “Privacy and confidentiality of actions will be among the major issues confronting the use of intelligent agents in our future of a fully interconnected, fully communicating society“ [Norman, 1994, p.70]. The belief of academics in online privacy as a major design issue and potential impediment to agent use is founded on household surveys that confirmed peoples‘ concern to maintain privacy online [Pew Internet & American Life Project, 2000; Ackerman et al., 1999; Hoffman et al., 1999]. Many scholars have also presented evidence that online users wish to have control over the data they leave behind in electronic environments [Shearin and Maes, 2000, Hoffman et al. 1999]. In addition, privacy or ’the right to be let alone‘ has historically been considered as a fundamental right of people [Warren and Brandeis, 1890] and found entry into countries‘ legal systems.<40>

On the other hand, customer information has become a strategic asset for companies, which allows them to leverage the benefits of one-to-one marketing practices [Kenny and Marshall, 2000; Reichheld and Schefter, 2000]. As a result, companies have an interest in creating personal profiles on their customers and web site visitors. Many Internet business models are built on customer information as a major asset and some online services even offer “freebies“ or other incentives in exchange for customer information [Chang et al., 1999, p.85].


99

Given these apparently conflicting interests of online marketers and consumers, Hagel and Rayport already noted in 1997, that there will be a “coming battle for customer information“ [p.53] and it is yet unclear how it will be resolved. One important question in this battle will certainly be to understand how valuable private information really is to consumers. Most privacy surveys conducted so far have been uniquely based on people describing their general attitudes towards the subject [Pew Internet & American Life Project, 2000; Ackerman et al., 1999; Westin, 1996]. Few insights have been gained though on the way consumers actually behave online. Some studies suggest that people are willing to give away private information for appropriate returns [Hagel and Rayport, 1997, Chang, 1999]. Other studies on social factors in human-computer interaction have shown that people often treat computers as they treat other human beings [Moon, 1998; Nass et al., 1995] and as a result can be led to disclose a lot about themselves if the machine responds appropriately [Moon, 2000].

Are online users/consumers really as concerned about their privacy as is widely believed? How do they value their private information? And how do online users deal with their privacy when they get the benefit of high-value personalized product recommendation in exchange? These questions are important to comprehend the role of privacy in agent interactions.

The shopping experiment was ideal to investigate these questions. Firstly, there was the possibility of measuring not only privacy concerns, but actual behavior. Second, participants were put in a second-generation-electronic-commerce type of environment where they would receive a benefit for data revelation: a personalized agent recommendation. Against this background, it was investigated to what extent are stated privacy concerns and preferences really impediments to agent interaction. So doing, it was assumed that agents are operated by marketers (web site hosts) and consequently, profile ownership does not remain with the customer.


100

6.2 Measuring Disclosure in Human-Agent Interaction

During the shopping session, agent Luci gave participants the opportunity to answer 56 purchase related questions. Seen that a successful offline purchase process was shown to involve only 3,3 questions that are discussed between a human sales agent and a customer [Haas, 2001], it was expected that the volume of 56 agent questions would not be fully exhausted by most of the experimental participants. Moreover, it was believed that the degree of privacy concern would be reflected in the number of agent questions answered by participants.

However, taking only the number of agent questions answered as a measure for the degree of participants‘ disclosure would have had one major drawback: it would have assumed that all information revealed by shoppers to be of the same value to them. Thus, answers would have been valued irrespectively of their importance and legitimacy .

In order to avoid this simplifying way of measuring disclosure and to respect more of a participants‘ perceived revelation during the shopping session, it was decided to develop a new measure. This measure aims to approximate the degree of perceived self-disclosure observable in human-agent interaction. What has been missing from research up to now though is an insight though into the very way in which people evaluate their private data. As Hine and Eve stated in 1998 [p.253]: “Despite the wide range of interests in privacy as a topic, we have little idea of the ways in which people in their ordinary lives conceive of privacy and their reactions to the collection and use of personal information.“

Studies that have explored the phenomenon of private information revelation online have done so focusing solely on the provision of single data units (such as the provision of an e-mail address), but reflected little on the context in which information units are requested on the Internet (see e.g. [Ackerman et al., 1999]). However, as Badenoch et al. [1994] resume, the “value [of information] is almost entirely dependent on the specific circumstances in which the information will be used“ [p.24]. A central aspect of information valuation in our model is therefore the context in which information is given. Context has been recognized for long in


101

information science literature as one of the most determining factors to value an information unit [Badenoch et al., 1994; Hine and Eve, 1998]. For example, in one context users might perceive the provision of their telephone number as a necessity and are therefore most willing to give it away (no/little cost). In other contexts, they might regard the provision of the telephone number as an unnecessary intrusion into their privacy and will only reluctantly provide it (high cost).

Since classical information search analysis is often based on a cost-benefit tradeoff (made by actors when determining behavior) [Moorthy et al., 1997; Stigler, 1961], the idea that online consumers incur a cost of search when interacting with agents was introduced. We called this cost ’private consumer information cost‘ (PCIC) [Annacker et al., 2001]. It is perceived by consumers when revealing truthful information about themselves on the Internet while knowing that afterwards some parts of their identity and personal profile will be known to the organization hosting a site (and expecting that their data will probably be used for further analysis or for sale).

The challenge confronted in developing a model for this construct of private information cost was that no tangible value is actually capable of representing it appropriately. There is usually no cost created to produce private information. Economic freebies or services so far offered in exchange for PCIC strongly differ in value [Chang et al., 1999]. Our model therefore focuses more on the identification of some overall variables driving PCIC and their interrelations. It can serve as an approximation for the likely perception of an information request that could be made by an online agent.

6.2.1 Independent Variables Driving Personal Information Cost on the Internet

PCIC has been developed against the background of disclosure to a selected-option based dialogue systems. Personal consumer information cost (PCIC) stands for the loss in utility a consumer perceives when giving away a truthful information unit about himself to such a system, hosted by a third party. This third party is an entity with which the consumer has no personal relations and for which high levels of trust


102

have not been established. An example of such a third party could be the host of a web site. PCIC expresses itself in a consumer‘s reluctance to answer the question of an interface agent in the context of an online search process for products. Strong reluctance stands for high information cost. In contrast, if a user has no problem to reveal an information unit about himself he incurs little cost.

As the determination of PCIC means to attribute value to different types of information units, research in information theory provided a starting point for modeling. Considerable research has been done on the valuation of information in management science (see [Badenoch et al. 1994, p.59] for an overview). None of these approaches are directly transferable to the current context. This is, because traditional theories of information value have a different perspective on value creation: While they are concerned mostly with the benefits for the recipient of information compared to the production cost of this benefit, the current context relates more to the cost of the provision of an additional unit of personal information while at the same time this provision leads to no measurable production cost. Yet, some principal theoretical constructs of information valuation can still be transferred to the current context, notably the influence of the context on information value, the relevance the information unit holds in this context and the effort required to process it [Badenoch et al., 1994].

The context in which an information unit is demanded can influence the perception of PCIC. A practical example may illustrate this: Let‘s assume a buyer who wants his goods to be delivered to the home. He will probably be most open to provide his address to the supplier. The delivery context creates the necessity to provide the address and thus legitimizes its provision. If, in contrast, the customer picked up the ordered products himself, he would probably be surprised if he had to leave his address with the vendor for there is no obvious contextual need for this information provision. It is likely that he would be reluctant to provide it. The example shows that the perceived legitimacy of an information request in a specific context drives the perceived cost of providing it. As Hine and Eve put it [1998, p.257]: “Requests for information not deemed necessary in order to carry out this function were deemed intrusive.“ The arguments suggest that the perceived legitimacy of a question in the disclosure context influences PCIC. Perceived question legitimacy


103

therefore represents one dimension in the PCIC evaluation model that has been developed. It is defined as the degree to which a question is perceived as justified in a given context.

The legitimacy of an information request is not only determined by the context, but also by its importance in that context. In the above example, providing the delivery address is very important for the fulfillment of the service. It is therefore intuitive to argue that the buyer perceives little cost to provide it. Yet, there may be other legitimate information units in the delivery context which are less important and thus are perceived more costly to provide. For example, the telephone number of the product recipient, or his working hours. The perceived importance of an information unit in a specific context thus also has an impact on the perception of PCIC. For modeling purposes, importance is defined as the perceived degree to which an information request can contribute to an optimal product or service experience. At the same time, while importance drives the legitimacy of an information request, the opposite does not hold true. For example, asking the buyer of a winter jacket what type and color of buttons he prefers may be a legitimate question in the purchase context, but will probably not be important to most consumers.

Finally, it has been recognized in literature that the effort to process information also leads to cost for consumers [Bettman, 1979]. Eventually, there may be information requests online that are difficult for users to answer. As a result, they may be reluctant to do so. For example, if a shopping agent asked for the envisaged gigabyte size of a hard disc, but the user does not know what a hard disc is. The perceived difficulty in answering a question represents the third dimension of the PCIC evaluation model that has been proposed.

The three main drivers of PCIC, identified as perceived legitimacy, importance and difficulty to provide an information unit in a specific online sales context are summarized in Figure 8. They are at the core of the empirical investigations presented hereafter. Certainly, they are not able to explain the phenomenon of PCIC in its entirety. Individual differences, for example, in the individual level of trust in online providers, online privacy attitudes, product experience etc. may also drive the level of PCIC. Yet, as will be shown below, the three variables examined represent a


104

good starting point to capture online users disclosure concerns in online purchase situations.

Abbildung 8: Drivers of Personal Consumer Information Cost (PCIC)

6.2.2 Empirical Survey Design

In order to investigate the hypothesized drivers of PCIC, an empirical survey was conducted on how the request for different information units would drive consumers‘ perception of PCIC. 39 subjects were invited to the university laboratory at Humboldt University Berlin and were asked to judge the 112 agent questions employed by the electronic shopping agent Luci (56 questions per product).<41>

The 112 agent questions and multiple choice answer options were displayed one by one to subjects on the left side of a computer screen. Subjects were asked to imagine that the questions displayed to them would be asked by an electronic shopping agent


105

on the Internet in the context of a purchase process for either winter jackets or compact cameras. On the right side of the screen, 11-point scales (ranging from 0 to 10) simultaneously asked subjects to judge each question‘s legitimacy and importance in the sales context, the difficulty to answer it as well as the overall perceived information cost (for a screenshot of the rating tool used see Appendix B6). The construct of information cost was explained to the participants in advance of the rating sessions through a text based briefing which used the following definition of PCIC: Information Cost is standing here for the ’intuitive readiness‘ to truthfully answer the question of the search engine; thus the spontaneous feeling, whether you would be willing to reveal the demanded information about yourself. ’No‘ Information Cost would mean that you have no problem at all to answer the question truthfully. ’Very high‘ Information Cost stands for the emotion that under no circumstances you would give this type of information about yourself to a search engine (for the full details of participant briefing see Appendix A6).

6.2.3 A Model for Personal Consumer Information Cost (PCIC)

For modeling purposes one outlier had to be excluded from the initial number of 39 observations. The model presented hereafter is therefore based on 38 observations.

6.2.3.1 Initial Regression Analysis

The relationship between PCIC as the dependent variable and legitimacy (Leg), importance (Imp), and difficulty (Diff) as independent variables were initially expressed as:

(1)

where: number of respondents, number of questions.

As ordinary least square analysis of this model (1) resulted in a relatively low of .439 for pooled data, F(3, 4252) = 1108.69, p < .01, an alternative model was estimated where unobserved heterogeneity was captured by dummy variables for each respondent (table 10).


106

Tabelle 10: Results for an Initial Fixed Effects
Regression Model for the Evaluation of PCIC:

Overall model fit

Adj.

F(40, 4215) = 173.80, p < .01

Parameter estimates

Independent variables

Parameter

Dependant variable: PCIC

Intercept

6.252

Leg

-.559

(.017)

***

Imp

-.011

(.018)

Diff

.138

(.014)

***

( ) standard error; *** p < .01

Since the data consists of partially dependent observations, controlling for these dependencies might lead to slightly lower levels of significance.

As can be seen from table 10, model (1) fit was considerable improved through the respect of individual differences in question judgment. The signs of all parameters supported the expectation that legitimacy and importance lead to a reduction in PCIC while the difficulty of an information request influences it positively. Surprisingly, however, the impact of perceived question importance turned out to be not significant. Investigating this result in more detail, a typical case of co-linearity was discovered in the data with a bivariate correlation of .825 between Leg and Imp. Co-linearity diagnostics suggested a borderline case of co-linearity with the largest condition index (18.50) being above 15 (see [Belsley et al. 1980]) for more details on this type of problem)


107

One way to address the problem of co-linearity in regression analysis is to formalize the relationship between the two related variables [Darnell, 1995]. It was therefore decided to explore the relationship between Leg and Imp in more detail (figure 9) in order to be able to comprehend the relationship between these two variables.

Abbildung 9: Relationship between Mean Perceived Legitimacy and Importance of Agent Questions

6.2.3.2 Relationship between Legitimacy and Importance of Information Requests

In order to allow for better interpretation of the data and visualize the relationship between perceived legitimacy and importance the data was aggregated by computing mean values of both variables (Leg and Imp) for all questions across the 38 subjects. Figure 9 gives an overview of the observations made. The graphical presentation of the data suggests that besides a strongly apparent linear relationship between legitimacy and importance of interface questions, mean judgments can apparently be separated into two distinct groups: For questions in the lower left corner (represented by graph B) an increase of one scale point in importance seems to correspond to a similar increase in legitimacy. In contrast, for questions in the upper right corner the increase in legitimacy is noticeably smaller (graph A).


108

In order to analyze the nature of these two apparently distinct relationships, the question nature was included within the project‘s interpretations. As was discussed in section 3.3.4. questions were purposefully designed to represent four different categories (for more detail see Appendix B5): 1) non-private questions (pd) addressing specific attributes sought in the product (e.g.: How resistant do you want the fabric of the jacket to be?), 2) marginally private questions (pepr) that referred to the consumer in person, but were also closely linked to product choice (e.g.: How important is the resistance of the fabric of jackets to you?) 3) relatively private questions (u) looking into the usage envisaged with the product (e.g.: Where do you want to wear the jacket?) and 4) purely private questions (peip) that would somehow be related to the sales context, but be completely irrelevant for product choice. (e.g. : Where do you obtain your knowledge about fashion? in the purchase context for jackets). Transferring this typology to the two distinct graphs (A and B), it is interesting to note that group A of questions (represented by graph A) are primarily product related questions (pd) as well as person oriented questions with a product focus (pepr). At the same time, group B (represented by graph B) are mostly questions focusing on personal attributes (peip) or usage (u). This finding suggests that the legitimacy of a product related question (A) may be less driven by its importance than this is the case for a more personal question. Or else: It seems that the legitimacy of personal agent questions may be relatively stronger driven by their perceived importance in the purchase context.

To go into more detail, Leg and Imp scales were divided into three tercile sections (0 - 3.33, 3.34 - 6.66, 6.67 - 10) and created 9 different classes for Leg x Imp. As can be seen in figure 9, there are only 5 classes relevant to the analysis: class 7 containing questions of low legitimacy and importance, classes 2 and 3 containing in contrast highly legitimate and important questions and class 5 where legitimacy and importance are medium. Class 4, which only contains two items appears negligible for the discussion. Table 11 gives an overview of how the 4 question classes (pd, pepr, u, peip) relate to the perceived legitimacy and importance frame in figure 9. There are strong scientific limitations of this table as some of the cross-tabulation categories contain a very small number of observations. However, the table still provides some valuable insights and hints for future research on this subject which is why it was included within the analysis.


109

Tabelle 11: Relating Nature of Agent Questions to Leg x Imp Classes:

As would be expected, 95,2% of product attribute questions (pd) were perceived as highly legitimate by subjects while over 82,4% of solely person oriented questions (peip) were perceived as little legitimate and unimportant. Highly legitimate product questions were spread across classes 2 and 3. Analyzing their nature in more detail showed that class 2 questions are asking for product attributes that might be less important to customers in the product choice process (such as the question asking for the type of hood on the jacket or the carrier cord of the camera) while questions in class 3 address product attributes with more choice relevance (such as color and material of the jacket or weight and zoom of the camera).

Looking into the perception of person oriented questions (peip) it is not surprising to note that people attribute little legitimacy and importance to those questions that only focus on the individual and obviously do not contribute to product or service delivery. As a result, it could be argued that asking for age, address, hobbies or other information on web site (e.g. through online questionnaires) may not be welcomed


110

by users if there is no reason for it or no context relation to the host‘s activities. This may be one explanation for people telling lies online when being asked, out of nowhere, to provide demographic data [Grimm et al., 2000; Sheehan and Hoy, 1999]. More research may be useful to confirm this possible finding.

On the other hand, table 11 indicates a relatively high acceptance (56,5%) of questions that, albeit focusing on the person do have a connection with product selection (pepr-questions). This implies that customers in many cases do not feel annoyed if they are asked personal questions as long as these relate to the product context. In fact, none of the pepr-questions have been perceived as totally illegitimate or unimportant. The same is true for usage related questions: those that relate somehow to features of the product (like motives you want to capture with the camera) are perceived as sufficiently important and legitimate (class 5). On the other hand, those that lack a link to product selection are perceived as rather illegitimate and unimportant.

6.2.3.3 Final Definition of Overall Model

Formal co-linearity diagnostics as well as the strong linear relationship between Leg and Imp depicted in figure 9 led to the conclusion that the validity of results obtained for the original fixed effects model (1) might be questionable. The model was therefore re-specified and estimated as a simultaneous equation model (2), which solved the problem of co-linearity. More precisely, the relationship observed for Leg and Imp was specified. Thus, in addition to the direct effects of Leg, Imp and Diff on PCIC a linear relationship between Leg and Imp was included (for detailed model output see Appendix C, table C3).


(2)

Again dummy variables were used to control for individual differences. As was shown above in the graphical analysis, a clear difference exists in the perceived relation of legitimacy and importance for the two question groups A and B. Based on model (2) two group-specific models were therefore estimated in addition to one


111

representing the total sample. Maximum Likelihood estimates for the model parameters (table 12) have been generated by Mplus [Muthén and Muthén, 1998], a software for the estimation of mean- and covariance structure models (widely known as SEM). Because of the small number of respondents one might be tempted to reject the application of this methodology in our study. To put this objection into perspective the following facts should, however, be taken into consideration. First, although sample size is 38 the number of observations is much higher since multiple data (112 questions) was collected for each respondent. This results in a total sample size of 4,256 observations. Secondly, the analysis does not correspond to typical SEM applications where latent variables with multiple indicators are involved. It is therefore questionable if general minimum sample size recommendations (100 - 200) or rules of thumb developed for these more complex models are applicable to the present study. Third, the ratio of sample size (4,256) to number of free parameters (82) is 52:1, which is considerably above recommended ratios to obtain valid parameter estimates and standard errors (see e.g. [Bentler and Chou, 1987]).

Since model (2) has one degree of freedom in addition to the multiple correlation coefficient ,alternative overall fit measures for covariance structure analysis have been used (for the interpretation of these fit statistics see for example [Jöreskog, 1993]). As can be seen from table 12, results for the total sample as well as for group A show an excellent fit according to the RMSEA fit indicator [Browne and Cudeck, 1993, Hu and Bentler, 1999]. However, it should be respected that in cases of low degrees of freedom (such as ours), fit statistics have relatively less confirmation power [MacCallum et al., 1996]. This moderates the confirmation of model fit slightly. It may be mirrored also in the wide confidence intervals that can be observed with the RMSEA measures in both cases. In addition, results for group B represent a borderline case in model fit as indicated by a fairly high RMSEA of .070.


112

Tabelle 12: Results for a Final Simultaneous Equation Model with Fixed Effects
for the Evaluation of PCIC:

Overall model fit

Total sample

Group A

Group B

RMSEA = .014

RMSEA = .037

RMSEA = .070

RMSEA 90% CI (.000, .046)

RMSEA 90% CI (.007, .075)

RMSEA 90% CI (.035, .113)

 

 

 

Parameter Estimates

 

Total Sample

Group A

Group B

Explanatory variables

Parameter

Dependent variable: PCIC

Intercept

6.250

4.569

6.274

L

eg

 

-.559

(.017)

***

-.397

(.022)

***

-.457

(.027)

***

I

mp

Direct effect

-.010

(.017)

.003

(.019)

-.055

(.029)

*

Total effect

-.499

-.232

-.437

D

iff

 

.138

(.014)

***

.182

(.016)

***

.159

(.020)

***

 

 

Dependent variable: Leg

Intercept

1.289

3.737

.714

I

mp

 

.875

(.009)

***

.591

(.013)

***

.839

(.015)

***

( ) standard error; ***p < .01; *p < .10; since the data consists of partially dependent observations, controlling for these dependencies might lead to slightly lower levels of significance.


113

Comparing model coefficients for the total sample model (2) (table 12) and our initial model (1) (table 11) clearly shows that the effect of Imp on PCIC was considerably underestimated by the original single-equation fixed effects model (1). Although the direct effect (-.010) is still insignificant in model (2), the total effect (-.499) is quite large and only moderately smaller than the effect legitimacy has on PCIC (-.559). The impact of perceived importance on information costs is thus obviously predominantly mediated by its influence on perceived legitimacy.

Since the two group-specific models A and B display some significant differences they were interpreted in more detail: Just as for the total sample the most important driver of PCIC in both groups is the perceived legitimacy of an information request. Imp drives PCIC predominantly via its influence on Leg. However, for more person-related questions (group B) a small direct effect could be discerned. As might have been expected from the preceding analysis of the Leg-Imp relationship (figure 9), Imp has thus a much stronger influence on Leg in group B (more personal questions) than in group A. Likewise the effect of Leg on PCIC is stronger in group B. Compared with the direct effect of Leg and the total effect of Imp on PCIC, the difficulty to answer a question is obviously perceived as less costly by respondents. As far as Diff is concerned, there are also only minor differences between the two groups.

6.2.4 Discussion of Results

With the development of the PCIC index a measure has been developed that to a certain extend reflects a user‘s perception of self-disclosure when being asked for information online by an interactive agent. More precisely, it was shown how the perceived legitimacy, importance and difficulty of an agent question combine to create in online users a feeling of intuitive readiness or denial to truthfully respond to a dialogue system.

With this, a model has been created that may be used for the strategic design of agent interfaces suggesting that agents should watch out for the perceived legitimacy and importance of their information requests in the purchase context. Today, most electronic commerce web site are only asking users for desired product attributes


114

(pd) (e.g. product configuration engines on manufacturers sites or product search engines on infomediary sites) or they ask them to fill out lengthy online questionnaires which mostly contain personal questions (peip). Very few sites start to include questions on usage (u) and nobody is communicating with users yet on general product expectations (pepr) (see critical discussion of current agents in [Spiekermann and Parachiv, 2001]). As was shown above, however, users do accept personal questions as long as they relate to the product context (pepr-questions). For example, asking a consumer whether he prefers trend models when choosing a jacket is initially a personal question, because it contains information on the consumer‘s general attitude towards fashion. As such it has considerable value for sellers, because they directly learn about their buyer‘s preference. However, the information unit also serves directly to recommend the right type of product to the client by respecting the degree of trendiness of different models in the electronic choice process. Strictly speaking, most marketers therefore realize opportunity cost of information today if they do not take advantage of the potential knowledge accumulation they can realize with pepr-questions. Additionally, as can be seen from graph A in figure 9, pepr- as well as pdd-questions are less driven by the Imp factor than personal- or usage oriented questions (graph B has a steeper slope than graph A). This finding implies that as questions become slightly less important for the customer, their legitimacy is not decreased to the same extend. Taking advantage of this relationship means that marketers could ask customers pdd- or pepr-questions that even though less relevant to the buyer are still important for product enhancement purposes. For example, asking consumers what type of closing mechanism they prefer for compact cameras might not be too relevant a question for most buyers. Yet, for manufacturers of compact cameras this information is highly valuable for product design decisions.

While these arguments suggest that there is room for online marketers to use dialogue-systems as an effective means to collect consumer information, the questions remains whether online users‘ privacy concerns will not impede an extensive collection of data. As was outlined above, privacy concerns are widely believed to potentially impede extensive online interaction. The next sections of this chapter will explore whether this belief is justified.


115

6.3 Privacy Preferences Versus Actual Interaction Behavior

On the basis of answering ratios measured for the agent dialogue in chapter 5 (table 9) it was clear that experimental participants had, in fact, disclosed much more information about themselves to the shopping agent than initially expected. However, this openness could have been attributable to corresponding low levels of low privacy concern in the sample. As a result, the starting point of the privacy analysis was the measurement of privacy attitudes in the sample. These attitudes would then be contrasted with behavior.

6.3.1 Data Used for the Analysis

The data used to investigate privacy attitudes and behavior were taken from treatments 1 through 4. Thus, data from camera and jacket shoppers have been analysed simultaneously. As 6 of the 206 individual observations had missing data, analysis was based on 200 observations. Another group of 29 subjects was identified who did not see and consequently did not consciously answer or reject several agent questions. As this behavior could not be explained and as it could not be attributed to any privacy concerns, these subjects were excluded from analysis leading to a final dataset of 171 observations. Two data sources were used for analysis: questionnaire answers to discern privacy preferences and log files to analyse behavior.

6.3.2 Measurement of Privacy Attitudes through Cluster Analysis

To investigate privacy attitudes, this project built on earlier work by Ackermann et al. [1999]. Parts of a questionnaire were used that has been developed by this group of scholars to test privacy preferences. More precisely, 14 variables were used to derive participant‘s privacy attitudes. 10 variables related to the readiness of subjects to reveal specific data units (such as e-mail address, name, hobbies or credit card number). 3 variables were indices developed on the basis of different online scenarios, for which users indicated how they would behave in terms of data revelation. And one variable finally referred to the question whether participants feared to sacrifice their privacy online. Appendix C, table C4 gives a detailed overview over the measures used. All data were z-transformed for the analysis.


116

With the help of the SPSS software package a K-means cluster analysis [Bühl and Zöfel, 2000; Jain et al., 1999] was then conducted. In order to use K-means, it has often been pointed out that data needs to be based on interval scales [Stevens, 1946]. However, if equal distance between answer options can be assumed, which is the case for the current analysis, ordinal scales can equally be used in K-means analyses. As Traylor concluded [Traylor, 1983]: “Ordinal data can, in many circumstances, be treated as interval data without a great loss in accuracy and with a great gain in interpretability“.

An initial hierarchical clustering process based on squared Euclidian distances had indicated the existence of four distinct clusters in the data (for more detail, see agglomorative schedule in Appendix C, table C5). Based on this target number of four clusters, K-means analysis was then conducted, starting out with a differentiated view on camera and jacket shoppers.

The differentiated analysis for the two product groups showed that the four clusters could be well separated in their privacy concerns (see table 13). Besides the two extreme groups, marginally concerned users (see table 13, cluster 1) and very concerned users (see table 13, cluster 4), two groups in between these extremes could be discerned. One group seemed to have a particular problem with the revelation of data such as postal address, e-mail address, phone number or credit card number (see table 13, cluster 2). The other group seemed to be more concerned about revealing information on computer equipment, salary, hobbies, health or age (see table 13, cluster 3). These two clusters were therefore called ’identity‘ and ’profile‘ concerned users. The distinction of the two groups-in-between was particularly pertinent for camera shoppers. Table 13 shows the details of these clusters with low (negative) values standing for low privacy concerns and high (positive) values standing for stronger privacy concerns.


117

Tabelle 13: Final Cluster Centres for K-means Cluster Analysis (Camera Shoppers):

 

Cluster

Cluster

Cluster

Cluster

 

1

2

3

4

Z-Wert(INDEX 1)

-.6470

-.7472

.1850

.6132

Z-Wert(INDEX 3)

-.8163

.1962

.1735

.2007

Z-Wert(INDEX 4)

-.3343

-.2759

-.6846

.5269

Z-Wert(CONCERN ON PRIVACY)

-.2124

-.3106

-.0101

.1517

Z-Wert(NAME)

-1.0424

-.5599

.3563

.4757

Z-Wert(ADDRESS)

-1.0488

-.6046

.4411

.4654

Z-Wert(EMAIL USAGE)

-.8038

-.4687

.0674

.6202

Z-Wert(PHONE NUMBER)

-1.2049

-.1855

.2831

.4606

Z-Wert(COMPUTER)

-.7552

.0447

-.5905

.6549

Z-Wert(MONEYNEW)

-1.0210

.3327

-.5319

.6411

Z-Wert(CREDIT CARD NUMBER)

.1999

-.8702

.2439

.2549

Z-Wert(HOBBY AND INTERESTS)

-.6917

-.0607

-.7215

.8267

Z-Wert(HEALTH)

-.8612

.5978

-.4953

.4536

Z-Wert(AGE)

-.7509

-.1307

-.5374

.7302

K-means analysis was then conducted on the basis of the entire sample, combining data from jacket and camera shoppers (see table C8, Appendix C). For this purpose, again, the target cluster number was set to four and cluster seeds were specified according to cluster centres derived from camera shoppers. The reason for choosing these cluster seeds was that it was wished to communicate the finding that there are, in fact, these distinct privacy preference, profile and identity concerns, which earlier studies could not discern [Ackerman et al., 1999]. Thus, it was possible to separate the “pragmatic majority identified by Ackerman et al. [1999] into two more meaningful groups which were called “identity concerned“ and “profiling averse“ users. Figure 10 gives an overview over the four clusters identified and the share of users in these groups.


118

Abbildung 10: Four Clusters Reflecting Fear to Lose Privacy through Profile or Identity Revelation on the Internet

In sum, the privacy clusters suggest that among all participants there was a basic level of privacy concern. Against the background of older privacy studies cited above, this finding is not surprising. However, as has been discussed, the question still remains whether participants really act consistently with their expressed behavior.

For this purpose, it was investigated in a next step whether interaction behavior would be consistent with the attitudes stated. Two aspects of interaction behavior were considered: (a) whether participants voluntarily communicated their address to Luci before entering the question-answer cycle, and (b) how many and what types of questions participants answered when communicating with Luci. The first variable is a measure of the willingness to satisfy an information request separated from the sales dialogue and linked to identification. It was expected that ’identity concerned‘ users (cluster 3) would react particularly averse to this type of information provision. The second variable is a measure for the willingness to provide information embedded in a sales dialogue. Since many personal and profile-sensitive questions


119

were asked in this communication context, it was expected that profiling averse users (cluster 2) would be particularly reserved.

6.3.3 Comparing Privacy Attitudes with Behavior

6.3.3.1 Address Provision

As described in chapter 3, all participants had to pass a html-page where agent Luci introduced herself and her purpose to the user and also gave participants the opportunity to leave their address (see screenshot _, Appendix B1b). No reason was specified why users should provide their address.

As expected from the nature of the cluster, marginally concerned users (cluster 1) had the lowest refusal rate in providing their home address for both privacy statements (30% for PS type 1 and 41% for PS type 2). Surprisingly, however, also 24-28% of privacy fundamentalists voluntarily provided their address before interacting with the agent. Identity concerned participants (cluster 3) also showed unexpected behavior. While under the condition of the ’softer‘ first privacy statement type 1 93% refused to provide their home address, only 65% did so under the even harsher conditions of privacy statement type 2. Thus, 35% of identity concerned users provided their home address without any reason to do so.<42> All observations are summarized in table 14.

Notably, across privacy statements there was an average of 35-40% of participants who gave their home address without any reason to do so. This raises the question how privacy conscious online users really are. In particular, the mentioning of the ’security providing‘ EU law in PS 1, led to an increase in voluntary address provision, as can be seen for most clusters in table 13. The average difference of 5% more address provision with EU law citation (11% without the inconsistent group of cluster 3) was interesting, though not significant (<sup>2</sup> (1) = 0.33, p > 0.5).


120

Tabelle 14: Contrasting Privacy Attitudes with Voluntary Address Provision

Privacy

Clusters

 

PS type 1
(voluntary
address

provision)

PS type 1
(no voluntary
address

provision)

PS type 2
(voluntary
address

provision)

PS type 2
(no voluntary
address

provision)

Sum of
Participants

 

 

 

 

 

 

CL1:marginally concerned

14

6

13

9

42

% of cluster

70%

30%

59%

41%

 

 

 

 

 

 

 

CL2: profiling averse

9

10

7

19

45

% of cluster

47%

53%

27%

73%

 

 

 

 

 

 

 

CL3:identity concerned

1

13

7

13

34

% of cluster

7%

93%

35%

65%

 

 

 

 

 

 

 

CL4: fundamentalists

7

18

6

19

50

% of cluster

28%

72%

24%

76%

 

 

 

 

 

 

 

sum tot

31

47

33

60

171

% of sum

40%

60%

35%

65%

 

 

 

 

 

 

 

6.3.3.2 Revelations During the Sales Dialogue

To represent the depth of interaction with the sales agent, the PCIC index described above was used. The PCIC index was calculated by inserting the number and type of questions answered by an individual participant into the PCIC regression functions A or B (table 12). The 171 PCIC index values where then split into terciles, contrasting individuals with low, medium and high disclosure. Table 15 summarizes the findings. Table 15 shows that participants from all clusters had a strong tendency to self-disclose. 87% of users were in the group with maximum PCIC values. This behavior could be observed across both product types, with 84% of camera shoppers and 98% of jacket shoppers in the highest PCIC group. Averaging across clusters, a mean of 85.8% of agent questions were answered (85.8% for cameras and 86.1% for jackets). As expected, however, the distribution of PCIC was different across clusters (<sup>2</sup> (6)=16.57, p < .05).


121

An investigation of cluster details showed that privacy fundamentalists (cluster 4) in particular did not live up to their expressed attitude. 78% of them display high PCIC values and answered an average of 86% of the agent questions. With this, they only answered 10 percentage points fewer questions than marginally concerned participants (cluster 1). Comparing behavior for the two product groups, it was found that for cameras only 83% of privacy fundamentalists had a high PCIC value, while for jackets 95% of fundamentalists were in this group. A difference of 7% in self-disclosure between the two products can also be observed for cluster 2. The findings hint at the possibility that the product category may have an influence on the extent of information revelation. This is consistent with the finding in section 4.3. that jacket shoppers had a tendency to answer and modify slightly more personal questions than camera shoppers.

Consistent with expectations, profiling averse participants (cluster 2) gave less information during the shopping dialogue than identity concerned participants (cluster 3). With ’only‘ 78% of people being in the high PCIC group. Therefore, clusters 2 and 4 turned out to be the groups with the most reserved behavior.

Mann-Whitney-U tests for different PCIC distributions across the two privacy statements generally (p=0.969) and for both products separately (camera: p = .526; jackets: p = .227) showed no significant differences in this obvious readiness of users to self-disclose. This is a surprising result as the privacy statement had been expected to have a greater impact on disclosure.


122

Tabelle 15: Contrasting Privacy Attitudes with Online Communication Behavior

Privacy
Clusters

Low
PCIC

Medium
PCIC

High
PCIC

Sum

 

 

 

 

 

CL1: marginally
concerned

0

0

42

42

Row %

0%

0%

100%

100%

Total %

0%

0%

24%

24%

 

 

 

 

 

CL2: profiling averse

3

7

35

45

Row %

7%

15%

78%

100%

Total %

2%

4%

20%

26%

 

 

 

 

 

CL3: identity concerned

0

1

33

34

Row %

0%

3%

97%

100%

Total %

0%

1%

19%

20%

 

 

 

 

 

CL4: fundamentalists

3

8

39

50

Row %

6%

16%

78%

100%

Total %

2%

5%

23%

30%

 

 

 

 

 

Sum

total %

6

4%

16

9%

149

87%

171

100%

6.3.4 Discussion of Results

The results suggest that there is a huge discrepancy between online users‘ expressed privacy concern and their subsequent behavior. Regardless of their expressed attitudes towards the subject, the majority of participants were ready to reveal private and even highly personal information to the shopping agent and let themselves be ’drawn into‘ communication with the anthropomorphic agent. The degree of inconsistent behavior found in the data among ’privacy aware‘ clusters 2 to 4 are


123

particularly surprising. The results are even more relevant when one considers the experimental conditions: after all, agent questions were designed to include many non-legitimate and unimportant personal questions. Participants also had to sign that they agreed to the selling of their data to an anonymous entity. As was mentioned in chapter 2, efforts had been made to minimize sympathy with the experimenters during the experimental briefing. The conditions under which participants ’revealed themselves‘ were therefore probably even less favourable in terms of privacy than a regular Internet shopping trip would be. At the same time, a very avant-garde technology was employed, using an interactive agent system that provided users with real recommendation benefits in return for their data. This benefit offered in return for user data is comparable to the business scheme of many companies such as bonus card issuers (e.g. Payback) that today offer customers discounts in return for their data [Chang et al., 1999]. On this background the findings indicate that even though Internet users have some view on privacy, they do not act accordingly when they expect a benefit from their revelations. This again is a fatal news to those who view privacy as a fundamental right. It suggests that the right to privacy or “the right to be let alone“ [Warren and Brandeis, 1890] has become a tradable good which people are ready to sacrifice and commercialise.

6.4 Conclusion

Privacy concerns have been described as a major challenge for the design human - agent interaction. However, the results obtained on online users‘ privacy behavior in this study shed a new light on peoples privacy concern: while all users stated to be at least marginally concerned about privacy, few of them acted accordingly when it came to disclosing information to a ’sympathetic‘ agent. At the same time, it was observed that a significant tendency of experimental participants to reduce interaction time and page requests the more privacy concerns they expressed (see structural equation model results presented in chapter 3). Different strengths of privacy statements did not impact behavior.

Against the background of these findings it is hard to conclude whether privacy is finally an impeding factor for online consumers‘ interaction readiness with agents. Privacy surveys collecting consumers‘ attitudes as well as the findings from the


124

equation model clearly indicate that people are concerned with their privacy. However, they seem to be willing to sacrifice it when responding to electronic shopping agents. There are two possible (and qualitative) explanations for this behavior, both of which probably deserve further research: First, participants may have consciously answered and modified agent questions, because they perceived the benefit from interaction, the product recommendation, to outweigh their cost of private information revelation. This assumes that online users make a cost-benefit evaluation when evaluating the worthiness of disclosure. At this point it is, however, interesting to note that ’rational‘ users should have realized that at least purely personal agent questions (peip-questions) could not possibly have been used by agent Luci to calculate a product recommendation (and thus providing the benefit). For example, answering the agent question “What do you usually do with your photographs?“ (with answering options such as “collecting them in a box“ or “glue them into an album“) strictly cannot lead a shopping agent to calculate a better product recommendation. As a result, users should not have expected any benefit from answering this type of personal agent question. One debriefing interview with a student revealed that she (the student) had made an interesting junction between the respective agent question and the product recommendation. “Perhaps“, she said, “the agent would respect in his recommendation that photos be put in an album and therefore expect the photos taken by the camera to be of really high quality“. The student did not reflect on the fact that the development and quality of photographs (let alone album collection) are completely independent of the type of compact camera used. This type of illogic connection made in interacting with agents may be an interesting area of psychological research.

The second explanation why participants answered so many agent questions could simply be their ignorance when it comes to privacy implications of electronic communication. This potential ingenuousness was reflected in one debriefing interview with another participant who stated that (those conducting the experiment) would not be able to interpret his interaction behavior with sales agent Luci for, after all, he (the participant) had ’erased‘ all initial answers provided to the agent once he had profited from recommendations. Thus, obviously, the participant was not aware of the fact that every user request is logged by the server providing the web site service and that for this reason all his preference data (as well as the erasure process


125

of course) had been registered. The anecdote goes in line with Goldberg stating in 1997 [Goldberg, 1997]: “New users of the Internet generally do not realize that every post they make to a newsgroup, every piece of e-mail they send, every World Wide Web page they access, and every item they purchase online could be monitored or logged by some unseen third party.“ For the current analysis, it must however be mentioned again that 86,7% of the participants had stated to regularly use Internet and e-mail which puts our findings in another light: they suggest that even frequent users are not necessarily knowledgeable about the technological processes taking place ’behind the screen‘ and are thus not capable to effectively protect their privacy.


Fußnoten:

<40>

See e.g. European Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Also, the ’Recht auf Informationelle Selbstbestimmung‘ which is part of the German ’Grundgesetz‘ recognizes privacy as a fundamental right of people (here it falls among the ’Allgemeinen Persönlichkeitsrechte‘ Art. 1 Abs. 1 GG).

<41>

Note that these 39 subjects did not know anything of the shopping experiment and also did not participate in it.

<42>

The addresses provided were checked in the click-stream data and it seemed that no false addresses have been provided.


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.

DiML DTD Version 2.0
Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML - Version erstellt am:
Mon Dec 23 13:23:15 2002