Sunday, July 28, 2013

Do Not Track


Disclaimer: The opinions expressed in this post are, for better or for worse, my own and are not intended to reflect the policies or positions of my employer, Oracle, or those of the W3C

If you have Do Not Track on your radar you must have seen a number of news items and blogs most of them reporting that the Do Not Track initiative is deadlocked; at an impasse. See, for example from Bloomberg: Web’s Mad Men Fight Browser Makers Over Online Tracking which starts off by saying: "Yahoo!, AOL and other companies dependent on Internet ad revenue are fighting Web-browser makers, including Microsoft, over how to let consumers avoid being tracked online." See also the New York Times blog: Wrangling over 'Do Not Track' and review Don't Track Us and Dan Appelquist's blog.

If you have not been following, Do Not Track is a W3C working group that is attempting to standardize a HTTP header that indicates that the user does not want his visits to websites to be tracked and his personal data collected and shared with advertising networks. Other aspects of the proposed standard include a well-known location (URI) for providing a machine-readable tracking status resource that describes a service's DNT compliance and a HTTP response header field for resources to communicate their compliance or non-compliance with the user's expressed preference. The Working Group has been meeting for about two years and browser makers have enabled a Do Not Track option, some of them turning it on by default but compliance from advertisers has yet to come.

At the 2011 Web Tracking Workshop one of the arguments advanced for starting the Do Not Track WG was that if the industry did not agree on a standard it would be imposed on them by legislation.

In May 2011 the EU published an EU e-Privacy directive, that requires websites to indicate on the page whether cookies are being used, where to go for more information and how to give or withhol dconsent. If you visit, for example, the Guardian website there is a banner right at the top that says cookies are being used and points you to a link that tells you more about how the Guardian uses cookies. There is no such legislation in sight for the U.S.

Another option is, of course, self regulation or voluntary compliance. Have you seen the AdChoice icon?


This is brought to us by the Digital Advertising Alliance (DAA), a coalition of advertisers, publishers, and marketers that has been working to increase transparency on the Web and create controls for online advertising. This clickable icon floats near ads and is meant to give users information about targeted ads and the data collected by ads. It also gives users a Do Not Track option. Now, the AdChoice icon is coming to mobile browsers.

The DAA says that the AdChoice icon is used in 30 countries but I have not seen a lot of it on the websites I frequent but that may be just me and where I walk.

On July 26, 2013 the New York Times reported agreement by a variety of groups, including app developers and consumer advocates to test a voluntary code of conduct that would require participating app developers to offer notices about whether their apps collect certain personal data from users or share user-specific data with entities like advertising networks or consumer data resellers.

So, perhaps, we will end up with self-regulation; better than nothing but not really enough. Self regulation may stave off legislation but it is unenforceable and it depends upon the cooperation and goodwill of advertisers :-( .

Saturday, July 20, 2013

Linking and the Law

Disclaimer: The opinions expressed in this post are, for better or for worse, our own and not intended to reflect the policies or positions of our employers or those of the W3C
Ashok Malhotra (Oracle), Larry Masinter (Adobe) --some thoughts, based on "Publishing and Linking on the Web" co-authored with Jeni Tennison and Dan Appelquist for the W3C TAG.
Also published as Larry Masinter's blog

If you type a Web address into your browser you will most likely be taken to a Web page consisting of text and images. This is less true now than it used to be. Today, you may be taken to a game where you can pretend to be a race car driver or throw stones at pigs but still, in most cases, you will get a Web page. From the information on the page you may be able to access related material by simply clicking. This capability is what makes the Web the Web.

If you are creating a Web page you can use material from other sources in different ways. You can provide a link to the material or you can embed it -- include or transclude (a wonderful word coined by the Hypertext visionary Ted Nelson) -- within your material. To include material, you copy it into your Web page. To transclude it you provide a reference to the material and it is included as part of your Web page when it is rendered. Inclusion and transclusion as opposed to linking are quite different and treated differently by courts.

Here is a page from Wikipedia that includes the picture of a whale from another web site:



The above page is from "http://en.wikipedia.org/wiki/Blue_whale" and if you click on the image in Wikipedia it tells you where the image came from and that it is in the public domain "because it contains materials that originally came from the U.S. National Oceanic and Atmospheric Administration, taken or made as part of an employee's official duties.”

With embedding you see the embedded content on the page. Linking, on the other hand requires a user action. The link, often rendered in another color requires the user to click on it and when she does it takes her to another Web page. But there are advantages to inclusion vs. linking or transclusion. If you include material, that material is not going to change out from under you, whereas that material at the end of a link or material you transclude may change. In the worst case, it could be replaced by malware or a virus.

Legal cases

In recent years there have been a rash of legal cases relating to linking and embedding. There was, for example, the case of Richard O'Dwyer -- a student who resides in the UK and was facing possible extradition to the US for posting links on a Web site, which itself is not US-based and is not primarily intended for US users, to material that the US considers to be copyrighted. (The situtation also raises the question of jurisdiction, but more on that later).

A broad general principle seems to be the notion of agency. If you link to something, you're less responsible for it being available than if you transclude it; if you transclude something, you're less responsible then if you include it (transclude a copy you made). Most of the questions are whether you're responsible for making information available that people don't want shared (bomb making, pornography, copyright infringement). If you do decide to embed, the material should be attributed and, unless it is a brief quote, requires permission; otherwise, you may be held responsible for copyright violation.

Linking, sometimes called hyperlinking, is generally allowed -- the argument has been made in several places that restricting linking is like interfering with free speech. Tim Berners-Lee argued in an early design document that a standard hyperlink is nothing more than a reference or footnote, and that the ability to refer to a document is a fundamental right of free speech. Others have argued similarly that a link is like telling you where you can find a particular book is a library or where you can go and watch a particular movie. When you go to the library you may, in fact, steal the book but that is not instigated by the link.

Jennifer Kyrmin, in The Legalities of Linking -- Web Links and the Law says “There have been one or two cases in the United States that imply that the act of linking without permission is legally actionable, but these have been overturned every time they come up.”

But, still, you need to be careful.

The words accompanying a link can express an opinion -- for example the HTML code:
<p> <a href="http://www.joe's.bar/menu.html">Joe's Bar</a> has great food! </p>

Which renders as:

Joe's bar has great food.

links "great food" to the bar's menu -- but some opinions may be construed as defamatory or libellous.

Consider: "Why pay for the new Daft Punk song when you can download it for free at http://copiedsongs.example.org/daftp/getlucky?

In other words, not only can the text around a link result in libel, the use of a link does not in general make otherwise illegal text legal. And then again, the material you link to may be so inflammatory that even minor responsibility might be risky; it's best not to link to Nazi propaganda, child pornography or “How to Make a Bomb”. Web media has been very effective in political campaigns, but if you link to political material it may be judged to be seditious by some governments, and you may be held responsible.

Restricting Linking

Even though linking, in general, does not violate copyright, some sites may want to restrict linking to all or part of their content.

The Digital Reader article Irish Newspaper Collective Wants to Charge License Fees for Links
ridicules the attempt to charge for merely giving directions on where to find information. But the request for payment is understandable. If you are a newspaper that invests in creating original content you would like to monetize your investment. The New York Times now allows a certain number of links per month. The Wall Street Journal requires you to subscribe. Other news media have similar policies. So, a link may tell you where to find a book but the library may charge a fee or be accessible only via membership. 

Incidentally, the original links to The Digital Reader article ceased to work. While a link may not violate copyright, publishers have the right to restrict linking and may impose a number of conditions such as pay barriers or age verification that must be satisfied before a link is followed.

Restricting Deep Linking

Many web sites restrict deep-linking, i.e. links to pages other than the top page, because this allows links to bypass advertising or the legal Terms and Conditions or because a deep link may leave the source of the material unclear. Often, legal Terms and Conditions are used to restrict deep linking but not only are such terms difficult to enforce but there are simple technical mechanisms that are more effective. See 1

Jurisdiction

The World Wide Web is truly an international phenomenon and as we have discussed, linking has been compared to freedom of speech. But there are limits to freedom of speech and, as we discuss above, some uses of external material may lead to legal action. If I live in the US and host a web site in a Scandinavian country that has links to offensive material, where could I be prosecuted? If I host a website in a country that does not have a bilateral copyright agreement with the US and the website includes swaths of US copyrighted material, can I be prosecuted? If so, where? In the case of certain kinds of international disputes, there are agreements that such disputes will be settled by mediation or arbitration. Perhaps, we need to formalize a similar capability for the Web.

Summary

Linking to material that did not originate with you is an essential feature of the Web and one that gives it much of its power. In general, linking to other material, as opposed to inclusion or transclusion, is safe and carries little risk but, as we explain above, you still need to be careful.

----------------------
1 It is straightforward to prevent linking to pages by not giving them URLs or making the URLs undiscoverable. This can also be accomplished by using the HTTP referrer header which indicates the last page that was referenced. If it was not a page on your own site, then you can redirect to your site's home page, for example.

/It is also possible to do this check in JavaScript, which can then be used to bring up an interactive dialog window to check whether the contractual terms have been read, to confirm that the user is over 18, or to ask for a password.

You can also use a cookie to, for example, start a session only when a page is accessed through a given gateway page and reject or provide an alternative path for requests that don't have the cookie set.

The User-Agent HTTP header which indicates the identity of the software making the request is particularly useful in preventing access from web crawlers and search engines. A robots.txt file on the web site can be used to prevent deep linking by crawlers and search engines.

The domain name or IP address of the client making the connection can also be used to prevent specific users from accessing material. -