A very problematic trend in copyright law has emerged during a series of cases these last few years. Courts have held that creating a copy of website code in a browser’s cache constitutes a copyright violation. The issue here is that copying site code into a browser is the only way you can browse the internet. This leaves us with an absurd outcome: The current legal scheme essentially makes browsing the web a copyright violation.
The fact is that website owners almost never pursue copyright claims on the basis of browser caching. Perhaps this shows that this type of copying does not create a harm. It would be impossible for the internet to function if website owners regularly sued site visitors for merely having visited their content. But some companies have selectively leveraged the copyright law when it suits them.
Loading a Website Requires Making a Copy of the Website
In order to surf the web, you probably load up your browser (Firefox, Chrome, etc.) and type in the URL you want to visit. When you do this, your browser calls up the address you want to see and loads a copy of the website into its temporary memory. This is called “caching.” The caching process speeds up your browsing experience because the next time you go to the same page, your browser will already have pre-loaded the content, meaning your page loads faster.
Some browsers have privacy settings that will automatically delete these temporary files when a user leaves the webpage. Others hold onto them until the files are replaced by other ones when the user visits new pages. Regardless, all browsers load a copy of a website in order to display it to the user in the first place. And most users are unaware this even occurs. It is simply an automatic process that allows users to access webpages.
With that in mind, one would never think that the core mechanism of how websites are universally accessed would be copyright infringement. However, prior court decisions have suggested that the opposite is true. There have been many court decisions that come to the unreasonable conclusion that loading pages into the browser cache somehow constitutes copyright infringement. These conclusions are especially harmful when they affect new information gathering technologies, including web scraping.
What Is Web Scraping?
A web scraper is an automated script that visits a website and collects certain types of data while throwing out the rest. For example, imagine if you wanted to know the prices of all the cars being sold by a local dealership. One way of doing this would be to visit every single page and manually write down the prices. A faster way of doing this would be to run a web scraping program to find all the prices and automatically record them for you. Web scraping allows for more efficient collection and analysis of information.
Recently, some websites have complained to courts about this practice, suggesting that web scrapers are a violation of the copyright for their web content — even when the information being gathered is not copyrighted (or even eligible for copyright). In one recent case, a web scraper run by HiQ labs gathered information from individuals’ public LinkedIn profiles, analyzed the data, and generated user reports. LinkedIn demanded that HiQ stop accessing LinkedIn profiles to gather this information, claiming that doing so constituted a copyright violation. In a similar case, a program run by Power Ventures scraped Facebook users’ data after being given permission to do so by the users. In its initial complaint, Facebook alleged that Power Ventures infringed on its copyright by loading a copy of Facebook’s site code when the web scraper visited the pages.
This is faulty reasoning — every single web browser makes a copy of site code when accessing a website. Every person browsing the web makes copies of the site in their browser, even if they aren’t using a scraper.
The Legal History of Web Caching
The legal issues revolving around web caching actually began with case involving computer repair and software loading. In 1993, a case called MAI Systems Corp. v. Peak Computer Inc. established that temporarily loading a computer’s hard drive software into another computer’s memory to run the program constituted copyright infringement.
This outcome was so problematic that Congress passed an act that revised the copyright laws in order to ensure that temporarily loading software for repair purposes did not result in a violation. The courts also attempted to address the MAI problem as it applies to television in a case called Cartoon Network v. CSC Holdings by concluding that temporarily making copies of content for streaming purposes does not violate copyright. However, neither of these remedies resolved the overarching issue caused by the MAI conclusion. Both the repair act and the Cartoon Network outcome only addressed copies of media that are temporary. Neither instance resolved the fundamental issue that Congress did not intend copyright to prohibit necessary mechanical functions that allow for content access, regardless of how long the copies last. Both the Congressional act and the Cartoon Network case still leave out important issues like browser caching, because cached websites exist for a longer period of time than loading memory for repair or cable video buffering.
Unfortunately, courts applied the conclusion from MAI to web browsing in a case called Ticketmaster LLC v. RMG, where the court decided that cached copies of websites constituted copyright infringement. The outcome in MAI was problematic on multiple fronts, but in particular, it should simply have not been applied to web browsing. The very nature of a public website is to invite visitors at large. And in order for visitors to answer this invitation, visitors must load a copy of the website into their cache. Thus, creating and publishing a website is, by definition, an invitation to all visitors to load a copy of the site’s code into their computer’s memory. You cannot put up a public website without inviting users to copy it. Software on the other hand usually exists in discrete copies and access is only directed toward particular authorized users. These two subjects are inherently different in how they are accessed and what their intended purposes are. It was therefore incorrect to apply the MAI reasoning to web browsing. And the outcome in Ticketmaster affected many future technologies that involve browser caching, particularly web scraping.
Cached Copies of a Site Should Not Be Copyright Infringement
Common sense dictates that the nature of a particular medium be taken into account when considering whether something should be copyright infringement. The point of copyright is to protect the creative rights of people who create work, not to prevent people from visiting webpages.
A person can walk by a store window and write down the prices he sees. This is the same thing as visiting an online store and writing down the prices. There is no real difference other than the fact that visiting the website inherently requires loading the site code into the browser. It is simply unreasonable to apply copyright to browser caching because it’s a basic function of how web browsing works. As such, technologies like web scraping should not be unlawful on the basis that it requires making a copy of code in the browser cache.
Image credit: Wikimedia Commons user Fabio Lanari
About Meredith Whipple
Meredith is the Digital Content Manager at Public Knowledge, where she focuses on writing and communications for the organization. Meredith has an extensive background in internet policy, including previously holding positions at the Center for Democracy and Technology, Hewlett-Packard, Consumers Union, the Berkman Center for Internet and Society, and the Federal Communications Commission. Meredith earned her Master's degree in Public Affairs from the LBJ School of Public Affairs at the University of Texas in Austin, and her Bachelor's degrees in Communications and Political Science from the Ohio State University in Columbus. In her free time Meredith is active in performing arts in DC.