This is a much shortened version of our presentation (by Keidra Chaney & Raizel Liebler) at the Law and Society Conference. The complete version (with citations!) will be published in the next Buffalo Intellectual Property Law Journal.
What is Web Analytics?
Web Analytics’ official definition by the Web Analytics Association, the worldwide professional organization for web analytics is: “the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage”
Web analytics involves the collection and measurement of various forms of online user data, and is traditionally used as a tool for market researchers and web professionals to measure the effectiveness of website communication. As web transactions have become a major source of revenue for companies large and small, online marketing and web communication has evolved to become more of a priority for marketing department, for these companies, measuring and optimizing user results have become a priority. Web analytics commonly provides information on online user activity including web page views, number of visitors, visitor location and referring websites. This information is then used by marketers to evaluate the effectiveness of website content.
The WAA cites the 1993 founding of web analytics software company WebTrends as the formal beginning of web analytics as an industry and a profession. There are two primary methods of data collection used by web analytics software to track user sessions on a website:
1.) Logfile analysis, which uses the log files stored on a website server to collect information on users’ IP addresses, date/time information and referring websites. A number of open source web analytics tools, such as AWStats and Piwik employ this method.
2.) Page tagging involves placing javascript code on a webpage to notify a third party server whenever a page is loaded in a browser, such as Microsoft Internet Explorer or Firefox. This method is employed by Google Analytics.
Cookies, a data collection method used by most hosted analytics software companies, tracks user sessions by placing a small piece of text on a user’s computer when a browser loads. The use of cookies by analytics vendors, including Google Analytics will be the focus of much of our discussion and analysis in this article.
Cookies
An http cookie is a very text file that is places on a users computer hard disk by a web server when a user loads a webpage on their browser. Cookies are commonly employed by web servers to track and authenticate detailed information about online users, based on identifying the specific computer/browser combination of the user. First party cookies are issues by the same website domain being visited. Third party cookies are issued to track user activity among multiple websites.
Third party cookies are commonly used by e-commerce companies for targeted online advertising based on clickstream behavior. While cookies are used by most analytics companies for data collection (including Google Analytics), privacy concerns do prompt some users to delete cookies from their computers after use. According to a 2007 report from web analytics firm comscore, 3 out of 10 internet users regularly delete cookies from their computers.
While cookie technology is not intended to violate consumer privacy by design, there have been instances of companies using this technology maliciously. A 2006 study on consumer understanding of cookie technology showed that users remain unclear about how cookies technology is used by websites, the advantages and disadvantages of use, and the differences between cookies, technology, viruses and malware.
Google Analytics
In 2005, Google acquired enterprise web analytics software provider Urchin, and began offering a modified version of Urchin’s software free of charge. As of 2009, Google Analytics had 59% of overall web analytics market share, according to a study by online analytics expert Stephane Hamel. An enterprise-level, licensed version of the Urchin software is still available for purchase by Google.
Google Analytics (GA) collects data through a combination of first-party cookies and javascript page tagging. GA does not collect personally identifiable information but does log user activity and identify unique visitors through the use of several types of cookies. The two most commonly referred to are:
Session based cookies are executed when a user views a page on a site. Google Analytics javascript code attempts to update this cookie. If no cookie is found, a new one is written and a new session is established. Session based cookies are updated to expire in 30 minutes, so a single session is logged as a 30 minute interval.
Persistent cookies are used to identify a unique visitor to a website, this cookie is written to the browser upon a users’ first visit to your a particular web browser. This cookie is stamped with a unique user ID and updated to expire in 2 years, so that returning visitors to a web site can be identified.
Google employs persistent cookies for many of its services, including gmail, to authenticate users. Privacy advocates has criticized this policy for the potential of leaving personal user data exposed to hackers and other security vulnerabilities.
Also, the use of Google Analytics for government websites was historically delayed due to Google’s use of persistent cookies, based on a policy issued by memorandum M-00-13 of the Federal Office of Management and Budget (OMB).
What are the generalized privacy issues involved with Google Analytics
Over ten years ago, Laurence Lessig stated in Code about the additional step for users in blocking cookies, an essential aspect integrated in analytics programs. “With one click, you can disable the deposit of cookies…. [b]ut this privacy comes at a cost. Users who choose this option are either unable to use [websites] where cookies are required or forced constantly to choose whether a cookie will be deposited. Most find the hassle too great and simply accept cookies on their machine.”
But that is not to say that those that want to prevent sharing of their information through web analytics cannot do so, especially with the upcoming browser add-on. However, it is likely that very few people will continue to take what Laurence Lessig in Code 2.0 calls “extraordinary steps” to protect their information:
Unless you’ve taken extraordinary steps—installing privacy software on your computer, or disabling cookies, etc.—there’s no reason you should expect that the fact that you visited certain sites, or ran certain searches, isn’t knowable by someone. It is. The layers of technology designed to identify “the customer [or user]” have produced endless layers of data that can be traced back to you. This is also not to say that users want to block the tracking of their online behavior. Many users are aware of the use of tracking cookies and understand its use and importance as a tool for improving online user experience.
danah boyd also divides up personal information in a unique way, applicable to how people view most of the information shared via web analytics programs
If you’ve spent any time thinking about privacy, you’ve probably heard of PII – “Personally Identifiable Information.” All too often, we assume that when people make PII available publicly that they don’t care about privacy. While some folks are deeply concerned about PII, PII isn’t the whole privacy story. What many people are concerned about is PEI – “Personally Embarrassing Information.” This is what they’re brokering, battling over, and trying to make sense of.
The opt-in/opt-out issue for information disclosure by those online helps show why Google Analytics can be so problematic through the differential between actual behavior and idealized or legally expected behavior. When people use websites they do not read terms of service — after all, once one has had a website loaded it likely is too late to avoid having a cookie or another tracking service. We will be discussing more about the confusing aspect of what is truly opt-in and opt-out within the context of web analytics below.
Web Analytics and Government Information
The first official full government statement regarding the type of information that can be tracked via web analytics was created in 2000. Also, the use of Google Analytics for government websites was historically delayed due to Google’s use of persistent cookies as mentioned in the Memorandum for the Heads of Executive Departments and Agencies: Privacy Policies and Data Collection on Federal Web Sites. Considering that this Memorandum strongly discouraged the use of web analytics through preventing the use of persistent cookies, those who wanted to use these programs were stymied.
This means that at present federal Web sites are currently prohibited from using persistent tracking technologies, as used by web analytics programs, like Google Analytics, unless the agency head gives permission after demonstrating “the use of persistent tracking technology for a compelling need” and follows several other steps. Interestingly, one such agency that does waive the ban on tracking is Whitehouse.gov, as of 2009.
In response to many years of web analytics not being used overall on government websites, in May 2009, the Center for cracy & Technology (CDT) and the Electronic Frontier Foundation (EFF) released a joint paper, Open Recommendations for the Use of Web Measurement Tools on Federal Government Web Sites.
In response to these recommendations and the concerns of others, in July 2009, the Office of Management and Budget has a proposal allowing federal agencies to use online tracking technologies on their websites, after posting “clear and conspicuous” notifications and opt-outs.
The proposed Google browser opt-in/opt-out option may obviate some of the concerns by those interested in preventing web analytics and Google Analytics specifically from being used by the federal government. Considering that people will be given an option to avoid having their persistent cookies tracked — even if it is not used by many — this arguably may allow government agencies to argue that they have protected the privacy of visitors to their websites, even within the present privacy standards. However, as discussed above, the fact that there is a privacy-protecting option available to users, does not mean that actual, real world privacy is actually being protected.
Non-Government Information in the United States
In the United States, considering there is not an overall privacy agency, but it seems as if at least for online privacy, the Federal Trade Commission is slowly becoming the regulatory body for regulating Internet privacy by non-government agencies. The FTC is taking on this role primarily through its Consumer Protection division. As part of their role, the FTC does not use the type of information that would be used in web analytics, including stating on their website — in bold that “We do not use persistent cookies.”
The Federal Trade Commission is likely, based on hearings, to adopt additional regulations or guidelines. The FTC is concerned about cookies and they are concerned about tracking IP addresses, so it is possible that analytics programs may be at risk, though this is not likely to be the focus of the FTC.
What will happen is uncertain, but we at least know that the present standard of notification as the baseline for privacy online will not continue to be the standard. Current FTC Chair Jon Leibowitz stated that “[w]e all agree that consumers don’t read privacy policies” and the notice and choice regime hasn’t “worked quite as well as we would like” Considering that most users of websites are unaware of web analytics, this is a step in the right direction, in understanding how people actually share information online, knowingly and unknowingly.
European Union
The EU’s approach to data protection has a broad scope for privacy. These laws cover all types of personal data, whether or not it is consumer data. Some of the E.U. standards include the OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, which are based around set principles and guidelines to streamline common privacy standards and to allow for transborder data transfer. The principles include openness, collection limitation, data quality, purpose specification, use limitation, security safeguards, and individual participation in data protection. The OECD even has a privacy policy statement generator on its website.
The other major privacy directive at issue with analytics programs in the E.U. is the European Union Directive on Privacy and Electronic Communication (2002/58/EC)). In October 2009, this Directive was modified, requiring website users to opt-in to tracking cookies.
Google’s recently announced opt-out/opt-in browser addition may potentially avoid some of the privacy complications, while making data collection less accurate. The edits allow for a specific option to obtain consent from users that has been now embraced by Google to allow users to continue to use Google Analytics with the new opt-in/opt-out browser feature. The new Directive states that
“Where it is technically possible and effective, in accordance with the relevant provisions of [EU Data Protection] Directive 95/46/EC, the user’s consent to processing may be expressed by using the appropriate settings of a browser or other application.”
Would the Google opt-in/opt-out browser option be sufficient to be acceptable under this new interpretation? Some commenters think it might not be enough, while others think it may be. The differential seems to be the difference between following the letter of the Directive – where having an opt-out to data collection on a browser may be sufficient versus looking at this through the perspective of user experience.
A browser opt-in to allow for analytics use – while at this point is likely legal – it does not entirely serve the interests of either those that wish to protect privacy unless a user has specifically opted in with complete knowledge or those that wish to have web analytics be as accurate as possible.
Germany
Despite Germany’s role in promoting privacy, the use of analytics programs and specifically Google Analytics are widespread in Germany. According to one article, about 13% of German website owners (sites that end with .de ) currently use Google Analytics, including major businesses including media, drug companies, and political parties. And one German-based study looked at 655,000 German web pages by 14,000 website providers to determine whether “a provider uses a statistics service like Google Analytics and declares this properly
In a February 18, 2010 statement by Germany’s federal data protection agency, the German federal data protection officer Peter Schaar informed health insurance companies that they are not permitted to use any web analytics program, leading to about 100 health insurance companies to stop using any web analytics program.
More than even the rest of Europe, Google’s opt-out browser option is likely to be met with skepticism by government officials, due to the likelihood of it being used by very few users. But on the other hand, considering the importance of Google Analytics generally and its present market share, those who use analytics program will see this option as the means to avoid implementing other changes.
What is the future of Google Analytics?
In March 2010, Google announced the development of an browser-based opt-out option for Google Analytics users, which would allow online visitors to GA installed website a choice in allowing their behavior to be tracked by the software.This development while ostensibly a response to criticism from EU governments, may also have been a response to Google Analytics developing relationship with U.S. government departments. In February 2010, Google Analytics was approved for use on the apps.gov website, a resource for U.S. government approved cloud computing applications.
Google Analytics’ proposed browser plug-in, while ostensibly a response to privacy critics, is highly unlikely to be used by the majority of online consumers. One concern that this proposed does solve is the need for different systems for different privacy models, but there are additional functionality-based options.
We believe that increased consumer education about about how online visitor information is collected and used by web analytics software is the best way to ensure a public accountability of the web analytics industry regarding privacy. This approach would be more impactful to structural policy change and a dialogue on online user privacy than Google’s functionality-based approach; one that likely very few consumers will use. It will also be more effective than a strictly regulatory approach that may always be a step or two behind developing analytics technology.
Clear overview of Google’s struggle with the privacy act(s). I was looking for the issue your referring to (Next issue of Buffalo intellectual property law journal, including the captations) but wasn’t able to find it. I would like to read it as a source for an university assignment, I was wondering whether you could provide me with a copy of it?
Best regards,
Sebastiaan