Google Book Search and Orphan Works

By Jef Pearlman on October 31, 2008 - 2:07pm

By now, you’ve probably heard about the proposed settlement in the Google Book Search lawsuit. By the terms of the deal (assuming it is approved by the court), Google gets the risk-free ability to scan, index, and in many cases, post portions of pretty much every book which has gone into U.S. Copyright by the end of the year. Clearly this is a win for getting the public access to large swaths of books which would otherwise been effectively lost to them. But is it really a win in the fight to make orphan works usable, as Professor Lawrence Lessig suggests? While it’s a step in the right direction (and has the benefits described), it’s not a very big one, nor is it enough to obviate the need for Congress to step in.

The short version of why is that while it helps Google index orphaned books and helps the public get access to them, it does nothing for non-book works or for anyone other than Google who wants to make use of orphaned books. The long version is detailed below.

What would this do for Orphan Works?

The settlement would do two main things, one for the availability of orphan works and one for the rights holders and those who want to license the works they control. First, it would give Google the ability to safely offer these works to the public, both in searchable form and as full, purchasable copies. This is a clear win for the public, who can not only find works that they didn’t know existed, but learn what libraries the works can be found in or even read them in their entireties without leaving their living rooms. It’s also a win for Google, who was already doing a lot of this, but facing potential copyright liability.

Second, it creates the books right registry, or BRR (please hold the chilling-effects puns), which gives authors a place to go to identify themselves and receive compensation for Google’s use of their books, including a portion of purchase and ad revenues. The creation of a BRR does good both for the authors of currently-orphaned works and for those who want to use them: It provides a way for authors to effectively un-orphan their books and receive compensation, and it provides a way for users to locate previously-unknown rights holders and obtain the rights to use those works. It also provides a financial incentive for those authors to come forward, as they will receive the compensation that the BRR has collected on their behalf.

So far it seems like a big win. So where’s the problem?

What wouldn’t this do for Orphan Works?

Other Types of Works

First, this settlement only helps out books and authors. There are all kinds of orphan works out there – not just large works like movies and songs, but old photographs and letters of unknown origin. These works are completely untouched by the settlement. And unless a large corporation finds a business model centered around them and an industry association sues them in a class action (see below), we’re unlikely to see a completely private-sector solution for those works any time soon.

Even in industries where there are rights organizations (ASCAP and BMI for musical works or Sound Exchange for sound recordings), they only work for artists who have signed up, leaving other works orphaned and their creators out of the loop. (See page 15 of this letter from the College Art Association to the U.S. Copyright Office for some examples.) Likewise, registries like the IMDB, which are designed for maximum coverage but not to license works, still can’t identify or provide contact information for all rights holders even of the publicly known works they list. And the prospect for a registry for things like letters and personal recordings seems far away at best, and in all likelihood impossible.

The First-Sued Advantage

Second, this settlement only applies to Google. Even in the ideal case where the BRR offers similar rights to non-Google organizations on nondiscriminatory terms – a situation which I sincerely hope will come to pass – the BRR can only offer third parties licenses for those authors in the registry. It is only the opt-out nature of a class action law suit that allows the AAP and Authors Guild to license the rights of millions of rights holders who are not actively involved in the case and often don’t even know they have rights to defend. Short of getting sued and settling (in a non-collusive fashion), no one else can pull this off. And since the case didn’t go to judgment, anyone else who wants to make fair use of these works will face uncertain legal ground and the possibility of a massive copyright suit. (As a side note, this means that Google’s license and the BRR’s collection on behalf of missing authors isn’t really a private sector solution, as it would have been completely impossible without the court’s assistance.)

This structure effectively limits the BRR to authors represented by the AAP or the Authors Guild or those who individually register themselves. If I want to use an orphaned book, and the rights holder does not identify himself or herself in the BRR, then I’ll be no better position after the settlement than I am right now. I still will run the risk of an expensive lawsuit if the rights holder shows up, and will have no way to mitigate that risk.

The Authors Guild represents more than 8,000 authors, and the AAP has over 300 member organizations covering an unknown number of authors. On the other hand, according to Brewster Kahle, founder of the Open Library (and member of our board), the over 20 million books listed on Open Library were written by over 5 million authors. Some of these, of course, wrote works that are now in the public domain, and others are represented by the BRR.

But how many of those 5 million authors of in-copyright books are unknown to Open Library, unreachable, and unrepresented? How many of these are going register with the BRR? What about the artists in media other than books? As long as there are large quantities of works out there where the rights holders are unknown and unreachable, we will need congressional intervention to protect the use of those works. And while the Google Book Search settlement does give us access (through one party) to lots of works, and establishes a registry to help shrink the number of orphaned books, there is still a huge orphan works problem lurking out there and demanding a more comprehensive solution.

Despite these limitations,

Despite these limitations, my inner bookworm is still happy that some sort of deal was reached. My first ever bit of geek activism was to volunteer time and money to the (still highly relevant and necessary!) Project Gutenberg. This takes us one step closer to the universal library—I just hope that it doesn’t end up owned and operated by just one company!

It’s still not clear to me exactly what kind of access people who buy access to the out-of-print but still copyrighted books will have. (A lot of them are not even orphaned—we know who owns the copyright—but simply abandoned.)

Currently, with public domain books you can download a full PDF of Google’s scan. This is contrary to what some of the reports on the deal say, such as this Financial Times article (which you can access without registering if you search the headline in Google News), which states that “accessing a book on Google’s service currently requires an internet browser and a handheld wireless device that is connected to the web.” I assume the “and” is just a mistake, but Google’s distribution of actual PDFs does seem to be frequently ignored.

I’d hope the access to these works isn’t limited to a web interface, or burdened with DRM. As though without DRM some out-of-print book from the 1940s on Incan pottery is going to be widely traded on BitTorrent. This isn’t just a “web interfaces suck” position: imagine what could be done with a smart DevonThink-like AI for entire books. Or the ability to create audiobooks automatically that comes with access to the actual text.

Hi, John. I’m also

Hi, John.

I’m also excited about what this means for the books being scanned and made available to the public. I’d would love some more competition in the space from Project Gutenberg and others, though, and I don’t think this agreement doesn’t make a big difference to those efforts.

With regard to in-copyright non-commercially-available books, my understanding is that you will have to use Google’s online viewer to read the books, but will have some (limited) copy/paste and printing abilities. I don’t believe there will be any change to the access to public domain books (i.e., you will still be able to get a PDF). And all of the books will be available for “non-consumptive research purposes” in some sort of semi-public archive.

Some details on reading can be found here and research here. You can also look at the full text of the agreement, which has a lot more details about how the research and previews work.