Disclaimer: The opinions expressed in this post are, for better or for worse, our own and not intended to reflect the policies or positions of our employers or those of the W3C
Ashok Malhotra (Oracle), Larry Masinter (Adobe) --some thoughts, based on "Publishing and Linking on the Web" co-authored with Jeni Tennison and Dan Appelquist for the W3C TAG.
Also published as Larry Masinter's blog
If you type a Web address into your browser you will most likely be taken to a Web page consisting of text and images. This is less true now than it used to be. Today, you may be taken to a game where you can pretend to be a race car driver or throw stones at pigs but still, in most cases, you will get a Web page. From the information on the page you may be able to access related material by simply clicking. This capability is what makes the Web the Web.
If you are creating a Web page you can use material from other sources in different ways. You can provide a link to the material or you can embed it -- include or transclude (a wonderful word coined by the Hypertext visionary Ted Nelson) -- within your material. To include material, you copy it into your Web page. To transclude it you provide a reference to the material and it is included as part of your Web page when it is rendered. Inclusion and transclusion as opposed to linking are quite different and treated differently by courts.
Here is a page from Wikipedia that includes the picture of a whale from another web site:
The above page is from "http://en.wikipedia.org/wiki/Blue_whale" and if you click on the image in Wikipedia it tells you where the image came from and that it is in the public domain "because it contains materials that originally came from the U.S. National Oceanic and Atmospheric Administration, taken or made as part of an employee's official duties.”
With embedding you see the embedded content on the page. Linking, on the other hand requires a user action. The link, often rendered in another color requires the user to click on it and when she does it takes her to another Web page. But there are advantages to inclusion vs. linking or transclusion. If you include material, that material is not going to change out from under you, whereas that material at the end of a link or material you transclude may change. In the worst case, it could be replaced by malware or a virus.
Legal casesIn recent years there have been a rash of legal cases relating to linking and embedding. There was, for example, the case of Richard O'Dwyer -- a student who resides in the UK and was facing possible extradition to the US for posting links on a Web site, which itself is not US-based and is not primarily intended for US users, to material that the US considers to be copyrighted. (The situtation also raises the question of jurisdiction, but more on that later).
A broad general principle seems to be the notion of agency. If you link to something, you're less responsible for it being available than if you transclude it; if you transclude something, you're less responsible then if you include it (transclude a copy you made). Most of the questions are whether you're responsible for making information available that people don't want shared (bomb making, pornography, copyright infringement). If you do decide to embed, the material should be attributed and, unless it is a brief quote, requires permission; otherwise, you may be held responsible for copyright violation.
Linking, sometimes called hyperlinking, is generally allowed -- the argument has been made in several places that restricting linking is like interfering with free speech. Tim Berners-Lee argued in an early design document that a standard hyperlink is nothing more than a reference or footnote, and that the ability to refer to a document is a fundamental right of free speech. Others have argued similarly that a link is like telling you where you can find a particular book is a library or where you can go and watch a particular movie. When you go to the library you may, in fact, steal the book but that is not instigated by the link.
Jennifer Kyrmin, in The Legalities of Linking -- Web Links and the Law says “There have been one or two cases in the United States that imply that the act of linking without permission is legally actionable, but these have been overturned every time they come up.”
But, still, you need to be careful.
The words accompanying a link can express an opinion -- for example the HTML code:
<p> <a href="http://www.joe's.bar/menu.html">Joe's Bar</a> has great food! </p>
Which renders as:
Joe's bar has great food.
Consider: "Why pay for the new Daft Punk song when you can download it for free at http://copiedsongs.example.org/daftp/getlucky?
In other words, not only can the text around a link result in libel, the use of a link does not in general make otherwise illegal text legal. And then again, the material you link to may be so inflammatory that even minor responsibility might be risky; it's best not to link to Nazi propaganda, child pornography or “How to Make a Bomb”. Web media has been very effective in political campaigns, but if you link to political material it may be judged to be seditious by some governments, and you may be held responsible.
Restricting LinkingEven though linking, in general, does not violate copyright, some sites may want to restrict linking to all or part of their content.
The Digital Reader article Irish Newspaper Collective Wants to Charge License Fees for Links
ridicules the attempt to charge for merely giving directions on where to find information. But the request for payment is understandable. If you are a newspaper that invests in creating original content you would like to monetize your investment. The New York Times now allows a certain number of links per month. The Wall Street Journal requires you to subscribe. Other news media have similar policies. So, a link may tell you where to find a book but the library may charge a fee or be accessible only via membership.
Incidentally, the original links to The Digital Reader article ceased to work. While a link may not violate copyright, publishers have the right to restrict linking and may impose a number of conditions such as pay barriers or age verification that must be satisfied before a link is followed.
Restricting Deep LinkingMany web sites restrict deep-linking, i.e. links to pages other than the top page, because this allows links to bypass advertising or the legal Terms and Conditions or because a deep link may leave the source of the material unclear. Often, legal Terms and Conditions are used to restrict deep linking but not only are such terms difficult to enforce but there are simple technical mechanisms that are more effective. See 1
JurisdictionThe World Wide Web is truly an international phenomenon and as we have discussed, linking has been compared to freedom of speech. But there are limits to freedom of speech and, as we discuss above, some uses of external material may lead to legal action. If I live in the US and host a web site in a Scandinavian country that has links to offensive material, where could I be prosecuted? If I host a website in a country that does not have a bilateral copyright agreement with the US and the website includes swaths of US copyrighted material, can I be prosecuted? If so, where? In the case of certain kinds of international disputes, there are agreements that such disputes will be settled by mediation or arbitration. Perhaps, we need to formalize a similar capability for the Web.
SummaryLinking to material that did not originate with you is an essential feature of the Web and one that gives it much of its power. In general, linking to other material, as opposed to inclusion or transclusion, is safe and carries little risk but, as we explain above, you still need to be careful.
1 It is straightforward to prevent linking to pages by not giving them URLs or making the URLs undiscoverable. This can also be accomplished by using the HTTP referrer header which indicates the last page that was referenced. If it was not a page on your own site, then you can redirect to your site's home page, for example.
You can also use a cookie to, for example, start a session only when a page is accessed through a given gateway page and reject or provide an alternative path for requests that don't have the cookie set.
The User-Agent HTTP header which indicates the identity of the software making the request is particularly useful in preventing access from web crawlers and search engines. A robots.txt file on the web site can be used to prevent deep linking by crawlers and search engines.The domain name or IP address of the client making the connection can also be used to prevent specific users from accessing material. -