[page 104↓]

Extension of user privacy requirements

“Not everything that can be counted counts, and not everything that counts can be counted.” (Albert Einstein)

Chapter 4 discussed the impact of privacy restrictions specified in legal frameworks and P3P policies on our analysis framework presented in Chapter 3. As indicated in Chapter 3 the results from the framework can be particularly useful for Web site personalization. As personalization systems become more effective with an increasing amount of user information, the impact of consumer privacy concerns is particularly high for these applications. This chapter discusses privacy concerns from a consumer point of view in more detail. We will compare 30 consumer privacy surveys, categorize them and point out the particular implications for personalization systems.

This chapter is organized as follows: Section 5.1 defines characteristics of personalization. Section 5.2 categorizes personalization systems according to the input data they require. Section 5.3 summarizes privacy concerns from more than 30 consumer surveys and describes their impact on personalization systems. Differences between consumers’ privacy views and their actual behaviors, and differences between consumer and industry opinions on privacy are also presented. Section 5.4 discusses future research directions and proposes approaches on how to increase consumer trust in personalization systems.

5.1  User-adaptable vs. user-adaptive systems

Personalized (or “user-adaptive”) systems have gained substantial momentum with the rise of the WWW. The market research firm Jupiter [Foster, 2000] defines personalization as predictive analysis of consumer data used to adapt targeted media, advertising and merchandising to consumer needs. A more Web-oriented definition was proposed by [Kobsa,et al., 2001] who regard a personalized hypermedia application as a hypermedia system that adapts the content, structure and/or presentation of the networked hypermedia objects to each individual user’s characteristics, usage behavior and/or usage environment. In contrast to user-adaptable systems where the user is in control of the initiation, proposal, selection and production of the adaptation, user-adaptive systems perform all steps autonomously.

The advantages of personalization can be manifold. Web site visitors see the major benefits in sites being able to offer more relevant content and to recall user preferences and interests [Cyber Dialogue, 2001]. The personalization of hypermedia is beneficial for several other purposes as well, most notably for improving the learning progress in [page 105↓]educational software [Brusilovsky, et al., 1998 Specht, 1998]. Given the increasing amount of information offered on the Internet, the development of advanced personalized services seems to become inevitable.

Personalization systems need to acquire a certain amount of data about users’ interests, behavior, demographics and actions before they can start adapting to them. Thus, they are often useful in domains only where users engage in extended (and most often repeated) system use. They may not be appropriate for infrequent users with typically short sessions. The extensive and repeated collection of detailed user data, however, may provoke consumer privacy concerns. Consumer surveys show that the number of consumers refusing to shop online because of privacy concerns is as high as 64% [Culnan and Milne, 2001]. Finding the right balance between privacy protection and personalization remains a challenging task.

5.2  Input data for personalization

Kobsa [2001] divides the data that are relevant for personalization purposes into ‘user data’, ‘usage data’, and ‘environment data’. ‘User data’ denote information about personal characteristics of a user, while ‘usage data’ relate to a user’s (interactive) behavior (e.g. as captured in the Web log). A special kind of ‘usage data’ is ‘usage regularities’, which describe frequently reoccurring interactions of users. ‘Environment data’ refer to the user’s software and hardware, and the characteristics of the user’s current locale.

Table 5-1 lists the most frequently occurring subtypes of these data. The taxonomy allows one to refer to specific kinds of personalization systems more easily, and facilitates our analysis of privacy concerns and their impacts on certain system types.


[page 106 - 107↓]

Table 5-1: Types of personalization-relevant data and examined systems

No.

Input Data

Examples of User-Adaptive Systems

A) User Data:

I

Demographic Data

Personalized Web sites based on user profiles; software providers: Broadvision, Personify, NetPerceptions etc.

II

User Knowledge

Expertise-dependent personalization; product and technical descriptions: Sales Assistant [Popp and Lödel, 1996], SETA [Ardissono and Goy, 2000]; learning systems: KN-AHS [Kobsa, et al., 1994], [Brusilovsky, 2001]

III

User Skills and Capabilities

Help Systems: Unix Consultant [Chin, 1989], [Küpper and Kobsa, 1999]; disabilities: AVANTI [Fink, et al., 1998]

IV

User Interests and Preferences

Recommender systems [Resnick and Varian, 1997]; used car domain: [Jameson, et al., 1995]; domain of telephony devices: [Ardissono and Goy, 1999]

V

User Goals and Plans

Personalized support for users with targeted browsing behavior, plan recognition: [Lesh, et al., 1999], PUSH [Höök, et al., 1996], HYPERFLEX [Kaplan, et al., 1993]

B) Usage Data:

VI

Selective Actions

Adaptation based on link-selection: WebWatcher [Joachims, et al., 1997], Letizia [Lieberman, 1995]; image-selection: Adaptive Graphics Analyser [Holynski, 1988]

VII

Temporal Viewing Behavior

Adaptation based on viewing time; streaming objects: [Joerding, 1999]; temporal navigation behavior: [Chittaro and Ranon, 2000]; micro-interaction: [Sakagami, et al., 1998]

VIII

Ratings

Adaptation based on object ratings; product suggestions: Firefly [Shardanand and Maes, 1995], GroupLens [Konstan, et al., 1997]; Web pages: [Pazzani and Billsus, 1997]

IX

Purchases and Purchase-related actions

Suggestions of similar goods after product selection: Amazon.com; other purchase-related actions: registering, transferring products into virtual shopping cart, quizzes

X

Other (dis-) confirmatory actions

Adaptation based on other user actions, e.g. saving, printing documents, bookmarking a Web page: [Konstan,et al., 1997]

C) Usage Regularities:

XI

Usage Frequency

Adaptation based on usage frequency; icon toolbar: [Debevc, et al., 1996], Flexcel [Krogsaeter, et al., 1994]; Web page visits: AVANTI [Fink,et al., 1998]

XII

Situation-action correlations

Interface agents; routing mails: [Mitchell, et al., 1994], [Maes, 1994], meeting requests: [Kozierok and Maes, 1993]

XIII

Action Sequences

Recommendations based on frequently used action sequences, e.g. past actions, action sequences of other users

D) Environment Data:

XIV

Software Environment

Adaptation based on users’ browser versions and platforms, availability of plug-ins, Java and JavaScript versions

XV

Hardware Environment

Adaptation based on users’ bandwidth, processor speed, display devices (e.g. resolution), input devices

XVI

Locale

Adaptation based on users’ current location (e.g. country code), characteristics of usage locale

5.3  Results from privacy surveys

5.3.1 Impacts on user-adaptive systems

We categorized 30 recent consumer surveys on Internet privacy (or summaries of such surveys), and analyzed their potential impacts on the different types of personalization systems listed in Table 5-1 (summary of taxonomy in Kobsa et al. [2001]). Questions from different surveys addressing the same privacy aspects were grouped together, to convey a more complete picture of user concerns. 11 documents included all questions, six provided an extensive discussion of survey results, and 10 contained factual executive summaries. For three studies, only press releases were available.

We distinguished several categories of privacy aspects. The category ‘privacy of user data in general’ has a direct impact on any personalization system that requires personal data [page 108↓](such as the user’s name, address, income etc.). The category ‘privacy in a commercial context’ primarily affects personalized systems in e-commerce. ‘Tracking of user sessions’ and ‘use of cookies’ influence user-adaptive systems requiring usage data. A few studies focus on ‘e-mail privacy’. This category might have an impact on user-adaptive systems that generate targeted e-mails. Two studies directly address the topic of privacy and personalization [Mabley, 2000 Personalization Consortium, 2000]. They are highly interesting because they directly affect most personalization systems.

Table 5-2: Results regarding user data in general

Results regarding user data in general

Systems affected

Internet users who are concerned about the security of personal information: 83% [Cyber Dialogue, 2001], 70% [Behrens, 2001], 72% [UMR, 2001], 84% [Fox, et al., 2000]

I, II, IV, V, IX

People who have refused to give (personal) information to a Web site: 82% [Culnan and Milne, 2001]

I, II, IV, V, IX

Internet users who would never provide personal information to a Web site: 27% [Fox,et al., 2000]

I, II, IV, V, IX

Internet users who supplied false or fictitious information to a Web site when asked to register: 34% [Culnan and Milne, 2001], 24% [Fox,et al., 2000]

I, II, IV, V, IX

Online users who think that sites who share personal information with other sites invade privacy: 49% [Cyber Dialogue, 2001]

I, II, IV, V, IX, XIII

A significant concern about the use of personal information can be seen in these results, which is a problem for those personalization systems in Table 5-1 that require ‘user data’ (such as demographic data’, data about ‘user knowledge’, etc.). Systems that record ‘purchases and purchase-related actions’ may also be affected. More than a quarter of the respondents even indicated that they would never consider providing personal information to a Web site. Quite a few users indicated having supplied false or fictitious information to a Web site when asked to register, which makes user linking across sessions and thereby accurate recommendations based on ‘user interests and preferences’ very difficult.


[page 109↓]

Table 5-3: Results regarding user data in a commercial context

Results regarding user data in a commercial context

Systems affected

People wanting businesses to seek permission before using their personal information for marketing: 90% [Roy Morgan Research, 2001]

I, II, IV, V, IX

Non-online shoppers who did not purchase online because of privacy concerns: 66% [Ipsos Reid, 2001], 68% [Interactive Policy, 2002], 64% [Culnan and Milne, 2001]

I, II, IV, V, IX

Online shoppers who would buy more if they were not worried about privacy/security issues: 37% [Forrester, 2001], 20% [Department for Trade and Industry, 2001]

I, II, IV, V, IX

Shoppers who abandoned online shopping carts because of privacy reasons: 27% [Cyber Dialogue, 2001]

I, II, IV, V, IX

People who are concerned if a business shares their data for a different than the original purpose: 91% [UMR, 2001], 90% [Roy Morgan Research, 2001]

IX, XIII

These results suggest that in a commercial context, privacy concerns may play an even more important role than for general personalized systems. Most people want to be asked before their personal information is used, and many regard privacy as a must for Internet shopping. Thus, commercial personalization systems need to include privacy features. In particular, those systems in Table 5-1 that require ‘demographic data’, ‘user knowledge’, ‘user interests and preferences’, ‘user goals and plans’ and ‘purchase-related actions’ are affected.

Furthermore, more than 90% of respondents are concerned if a business shares their information for a different than the original purpose. This has a severe impact on central user modeling servers that collect data from, and share them with, different user-adaptive applications, unless sharing can be controlled by the user [Kobsa, 2001 Kobsa and Schreck, 2003].


[page 110↓]

Table 5-4: Results regarding user tracking and cookies

Results regarding user tracking and cookies

Systems affected

People who are concerned about being tracked on the Internet: 60% [Cyber Dialogue, 2001], 54% [Fox,et al., 2000], 63% [Harris Interactive, 2000]

VI-X, XIV-XVI

People who are concerned that someone might know what Web sites they visited: 31% [Fox,et al., 2000]

VI-X, XIV-XVI

Internet users who generally accept cookies: 62% [Personalization Consortium, 2000]

VI-X, XIV-XVI

Internet users who set their computers to reject cookies: 25% [Culnan and Milne, 2001], 3% [Cyber Dialogue, 2001], 31% in warning modus [Cyber Dialogue, 2001], 10% [Fox,et al., 2000]

VI-X, XIV-XVI

Internet users who delete cookies periodically: 52% [Personalization Consortium, 2000]

VI-X, XIV-XVI

Users uncomfortable with schemes that merge tracking of browsing habits with an individual’s identity: 82% [Harris Interactive, 2000]

I, II, IV-X, XIV-XVI

User who feel uncomfortable being tracked across multiple Web sites: 91% [Harris Interactive, 2000]

VI-X, XIV-XVI, XIII

Users’ privacy concerns about tracking and cookies affect the acceptance of personalization systems based on ‘usage data’ and ‘usage regularities’ (cf. Table 5-1). In particular, systems using ‘selective actions’, ‘temporal viewing behavior’ and ‘action sequences’ conflict with users’ privacy preferences. More than 50% of Internet users are concerned about Internet tracking [Cyber Dialogue, 2001 Fox,et al., 2000]. Fox et al. [2000] found that user tracking is not welcome even when users receive personalized content in return. A significant number claimed they would set their browser to reject cookies [Culnan and Milne, 2001 Mabley, 2000] and more than half of the users stated they would delete cookies periodically [Personalization Consortium, 2000].

The results directly affect machine-learning methods that operate on user log data since without cookies, sessions of the same user cannot be linked any more. User concerns of tracking schemes across multiple Web sites affects personalization systems that combine [page 111↓]information from several sources, in particular those systems that use data from ‘action sequences’, ‘demographics’, ‘purchase-related actions’ and the user’s ‘locale’.

Most users do not consider current forms of tracking as helpful methods to collect data for personalization. Users’ participation in deciding when and what usage information should be tracked might decrease such privacy concerns.

Table 5-5: Results regarding e-mail privacy

Results regarding e-mail privacy

Systems affected

People who have asked for removal from e-mail lists: 78% [Cyber Dialogue, 2001], 80% [Culnan and Milne, 2001]

XII

People who complain about irrelevant e-mail: 62% [Ipsos Reid, 2001]

XII

People who have received unsolicited e-mail: 95% [Cyber Dialogue, 2001]

XII

People who have received offensive e-mail: 28% [Fox,et al., 2000]

XII

In the category of e-mail privacy, 62% of the users complain about irrelevant e-mail [Ipsos Reid, 2001]. Almost every Internet user has already received unsolicited e-mail [Mabley, 2000]. This may constitute a problem for the acceptance of personalized e-mail. The problem affects primarily those systems in Table 5-1 that use ‘situation-action correlation’. The findings indicate that many deployed e-mail personalization systems, such as software for the management of targeted marketing campaigns, are not yet able to address user needs specifically enough to evoke positive reactions among the recipients.


[page 112↓]

Table 5-6: Results regarding privacy and personalization

Results regarding privacy and personalization

Systems affected

Online users who see personalization as a good thing: 59% [Harris Interactive, 2000]

I-XVI

Online users who do not see personalization as a good thing: 37% [Harris Interactive, 2000]

I-XVI

Types of information users are willing to provide in return for personalized content: name: 88%, education: 88%, age: 86%, hobbies: 83%, salary 59%, credit card number: 13% [Cyber Dialogue, 2001]

I, II, IV, V, IX

Internet users who think tracking allows the site to provide information tailored to specific users: 27% [Fox,et al., 2000]

VI-X, XIV-XVI

Online users who think that sites who share information with other sites try to better interact: 28% [Cyber Dialogue, 2001]

I-XVI

Online users who find it useful if a site remembers information (preferred colors, delivery options etc.): 50% [Personalization Consortium, 2000]

I-V, XIV-XVI, IX

People who are bothered if a Web site asks for information one has already provided (e.g., mailing address): 62% [Personalization Consortium, 2000]

I-V, XIV-XVI, IX

People who are willing to give information to receive a personalized online experience: 51% [Personalization Consortium, 2000], 40% [Roy Morgan Research, 2001], 51% [Privacy & American Business, 1999]

I-V, IX

The results of the study by Harris Interactive [2000] affect all systems in Table 5-1. A significant portion of the respondents does not seem to see enough value in personalization that they would be willing to give out personal data. If any possible, personalization should therefore be designed as an option that can be switched off. Finally, Internet users also demonstrated less commitment to providing personal information in return for personalized content when a Web site would share this information with other sites. This result applies to all personalized systems that share information via a central user modeling server [Kobsa, 2001].

5.3.2 Differences in consumer statements and actual privacy practices

This meta-analysis demonstrates that consumers are highly concerned about the privacy implications of various data collection methods, but many would share some data in return for personalization.23 Users however do not seem to always have a good understanding of [page 113↓]their privacy needs in a personalization context. Stated privacy preferences and actual behavior often diverge:

5.3.3 Differences in the privacy views of consumers and industry

Besides differences in consumers’ self-perception and actual behavior, our analysis of survey results also uncovered a few major discrepancies in the privacy views of consumers and industry. Consumer expectations and actual industry practices should however be in line with each other, so that consumers can build trust which is the basis for the acceptance of personalization. For instance, 54% do not believe that most businesses handle the personal information they collect in a proper and confidential way [Harris Interactive, 2003 Responsys.com, 2000]. In contrast, 90% of industry respondents believe that this is the case for their own business, and 46% that this is the case for industry in general.24

Consumer demands and current practice in companies also diverge significantly on the issue of data control. Most Internet users (86%) believe that they should be allowed control over what information is stored by a business [Fox,et al., 2000], but only 17% of businesses allow users to delete at least some personal information [Andersen Legal, 2001]. Furthermore, 40% of businesses do not provide access to personal data for verification, correction and updates [Deloitte Touche Tohmatsu, 2001].

Industry and consumers also disagree significantly on the value of privacy laws. Nine of [page 114↓]ten marketers claim that the current regime of self-regulation works for their companies, and 64% think that government involve­ment will ultimately hurt the growth of e-commerce [Responsys.com, 2000]. In contrast, two-thirds of e-mail users think that the federal government should pass more laws to ensure citizens' privacy online [Gallup Organization, 2001], while only 15% supported self-regulation [Harris Interactive, 2000]. However, it has been found that trust in the effectiveness of privacy legislation has meanwhile decreased among consumers [Harris Interactive, 2001].

Although both governments and private organizations have made serious efforts to ease users’ privacy concerns, much remains to be done to build and maintain customer confidence, which is a prerequisite for successful personalization.

5.3.4 Discussion of the methodology

The cited studies were mostly conducted by well-known research institutions and market research firms between 2000 and 2003. The number of respondents in the studies varied between 500 and 4500, with an average of about 2000. The answers were collected by telephone interviews and online questionnaires. From the 30 surveys analyzed, 21 were conducted in the US, three in Canada, two in Australia and New Zealand, two in Britain and one in the European Union. One survey was based on an international respondent sample.

Though this meta-analysis provides a more comprehensive and objective over­view of privacy concerns and their impacts on personalization than can be expected from a single study, some caution should be exercised. A general problem is the lack of comparability of the studies: small differences in the wording of the questions, their context in the questionnaires, the recruitment method and the sample population make user statements difficult to compare. Harper and Singleton [2001] criticized the use of manipulative questions in many privacy studies, a lack of trade-offs between privacy and other desires, and imprecise terminology (e.g. the term “privacy” is often understood as a synonym for security, or a panacea against identity fraud and spam). Finally, as mentioned above, disparities seem to exist between people’s responses to general, context-less privacy questions, and their behavior when working with concrete Web sites having specific goals in mind.

5.4  Conclusion

Our meta-analysis of consumer surveys demonstrated that users’ privacy concerns are major. Survey results regarding Web user data in general, Web user data in a commercial context, Web usage data, e-mail privacy and personalization have been discussed. The impact of privacy concerns on personalization systems has been described.


[page 115↓]

Two different directions can be pursued to alleviate these concerns. In one approach, users receivecommitments that their personal data will be used for specific purposes only, including personalization. Such commitments can be given in, e.g., individual negotiations or publicly displayed privacy promises (“privacy policies”), or they can be mandated in privacy laws as discussed in Section 4.2.1. It is necessary though that these privacy commitments be guaranteed. They ought to be enforced through technical means [Agrawal, et al., 2002 Fischer-Hübner, 2001 Karjoth, et al., 2003], or otherwise through audits and legal recourse. Since individual privacy preferences may considerably vary between users, Kobsa [2003] proposes a meta-architecture for personalized systems that allows them to cater to individual privacy preferences and to the privacy laws that apply to the current usage situation. The personalized system would then exhibit the maximum degree of personalization that is permissible under these constraints.

The other approach is to allow users to remain anonymous with regard to the personalized system and the whole network infrastructure, whilst enabling the system to still recognize the same user in different sessions so that it can cater to her individually [Kobsa and Schreck, 2003]. Karat, Brodie, Karat, Vergo and Alpert [2003] also address this requirement through different levels of identity. Anonymous interaction seems to be desired by users (however, only a single user poll addressed this question explicitly so far [GVU, 1998]). One can expect that anonymity will encourage users to be more open when interacting with a personalized system, thus facilitating and improving the adaptation to the respective user. As discussed in Section 4.2.1, the anonymous use of data can relieve the providers of personalized systems from restrictions and duties imposed by such laws (they may however choose to observe these laws nevertheless, or to provide other privacy guarantees on top of anonymous access).

It is currently unclear which of these two directions should be preferably pursued. Each alternative has several advantages and disadvantages. Neither is a full substitute for the other, and neither is guaranteed to alleviate users’ privacy concerns, which ultimately result from a lack of trust. For the time being, both directions need to be pursued.


Footnotes and Endnotes

23 Users’ willingness to share information with a Web site may also depend on other factors that are not considered here such as the usability of a site, users’ general level of trust towards a site, and the company or industry to which the site belongs. For example, good company reputation makes 74% of the surveyed Internet users more comfortable disclosing personal information [Ipsos Reid, 2001].

24 However, only 40% of businesses say steps have been taken to secure personal information held by a site (Internet Privacy Survey 2001), and 55% do not store personal data in encrypted form. 15% share user data with third parties without having obtained users’ permission (Deloitte 2001).



© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML generated:
14.08.2006