The Importance of Project Gutenberg


Project Gutenberg is one of the Internet's great resources--the first "digital library," with thousands of public domain ebooks, and created entirely by volunteers. Its founder, Michael Hart, passed away this week, after founding the project--by typing in a copy of the Declaration of Independence--in 1971. In doing this, Hart invented the ebook, and what became Project Gutenberg release #1 is still available online. Hart's passing is a sad occasion but a good time to reflect on the importance of his life's work.

Permanence Requires Copying

Very few original texts from antiquity survive today. Most of what we know about Ancient Greek and Roman literature comes from copies, and copies of copies, and copies of copies of copies. No piece of paper, optical disk, or hard drive can be expected to survive indefinitely--the only way to safeguard the transmission of the written word to future generations is through massively redundant copying, with copies stored in as many different locations as possible. Thus the fact that Project Gutenberg is widely "mirrored"--stored on many different servers around the world--is important. It helps make sure that the effort that Hart and PG volunteers put into their work is not lost. But a copy is no good if you can't read it, which is why PG is committed to simple, open file formats--particularly plain text, which Hart called "Plain Vanilla ASCII." As Hart wrote, the reason for insisting on these formats is that "99% of the hardware and software a person is likely to run into can read and search these files."

Reflect on how different this approach is from the norm today. Other ebook publishers release their books in formats that are designed to discourage copying, and that can be read on hardware from only one vendor, tied to one specific device or reader. There are commercial reasons for this but it's apparent that Hart's approach, and not Amazon's or Apple's, is the one designed with the long-term preservation of literature in mind.

Digital Native

Project Gutenberg produces new, electronic editions of public domain works. With only a few exceptions for some notable editions, it does not just create electronic copies of print books. In fact, the words in a Gutenberg text might not perfectly reflect any particular printed book. This is an important distinction. It emphasizes the actual creative work more than any particular printed text and reminds us that literature is about words, not paper. It also stands in contrast to projects like Google Books that create, en masse, electronic copies of print books, using high-tech equipment and paid workers. There's room for both approaches--often Google Books will have a scan of some obscure 19th century book missing from PG's library. But while Google Books may have surpassed Project Gutenberg in sheer numbers, the quality of the typical Gutenberg text, created by volunteers and proofread by human beings, is higher than the typical automated Google Books scan. And there's an important psychological consequence to PG texts being their their own editions. PG texts show a commitment to digital and thus feel more "real" than mere scans, where the printed page is always more definitive. In a time when most people had a printed manual handy when they used a computer, Hart recognized the value of computers as an expressive medium in and of themselves.

Dedication to the Public Domain

Before there was Creative Commons, there was the public domain. Not only does Project Gutenberg only release works that are in the public domain in the United States, it claims no copyright at all in its releases. The work that PG and its volunteers do is very hard and time-consuming, but as it states, "non-authorship activities do not create a new copyright." It specifically disclaims any copyright interest arising from such activities as "scanning and optical character recognition (OCR), proofreading and OCR error correction, fixing spelling and typography, including substantial updates to spelling such as changing from American to British English" and so forth. In a time when broadcasters want to claim a property interest in content merely for having broadcast it, and when some people demand a "copyright" (rather than credit) for having scanned historical images, PG's clear and specific statement that copyright belongs only to creators is as admirable as it is rare.

Because PG works are in the the public domain, and because of their uniform high quality, they have become an invaluable resource for other projects--PG even keeps a (partial and incomplete) list of the different uses its works are put to. PG texts can be easily converted into accessible forms for the disabled, and volunteers for LibriVox create free audiobooks of Project Gutenberg works. Additionally, the Kindle, Nook, and iBook stores are full of books that can be originally sourced to PG texts. Thanks to ereaders, smartphones, and tablets, ebooks are more popular today than ever before, and millions of people have probably read ebooks that are derived from PG texts without knowing what they owe to Michael Hart and the thousands of PG volunteers.

Hart's Influence

Many other large Internet projects follow in Hart's footsteps. Like Project Gutenberg, Wikipedia relies on volunteer effort. Like Project Gutenberg, the Internet Archive is dedicated to the digital preservation of culture. Commercial and nonprofit efforts like the Million Book Project and Google Books can trace their ideas to Hart. A true visionary, Michael Hart started something larger than himself and both his ideas and Project Gutenberg itself have a long future ahead of them.

