The Internet Archive, founded in 1996, is a 501(c)(3) nonprofit organization and digital library that provides free access to digital materials. The Internet Archive collects websites, images, audio, video, software programs, books and more. A pioneer in digital media preservation and archiving, the Internet Archive contains unique collections such as the Bay Area Reporter Archives, the oldest continuously published LGBTQ weekly newspaper in the United States, the Marion Stokes Archive containing over 71K home-recorded video tapes covering 33 years of the 24/7 TV news cycle, The Internet Arcade, a collection of coin-operated video games from the 1970s through to the 1990s, emulated in original software packages, the Ferguson Tweets archive, a collection of tweets that mention Ferguson, Missouri between August 10th and August 27th, 2014 subsequent to the death of Michael Brown, and The Whole Earth Catalog Archive, to highlight a few.
The Internet Archive took an early stance in stressing the importance of digital collections, while emphasizing the preservation of digital media. In a 1997 article for Scientific American, Internet Archive founder Brewster Kahle explains the need to preserve digital heritage, citing the instability of websites and digital documents. He pointed to burgeoning new services and means for studying culture and history, while drawing parallels between digital and print collections:
If the example of paper libraries is a guide, this new resource will offer insights into human endeavor and lead to the creation of new services. Never before has this rich a cultural artifact been so easily available for research. Where historians have scattered club newsletters and fliers, physical diaries and letters, from past epochs, the World Wide Web offers a substantial collection that is easy to gather, store, and sift through when compared to its paper antecedents. Furthermore, as the Internet becomes a serious publishing system, then these archives and similar ones will also be available to serve documents that are no longer "in print".
Other organizations like HathiTrust and the Digital Public Library of America (DPLA) also provide digital collections with a strong dedication to a public option for access and education. The Internet Archive, however, remains the main player in preserving born-digital items and now offers a monumental collection of 330 billion webpages, 20 million books and texts, 4.5 million audio recordings, 4 million videos, 3 million images, and 200,000 software programs. As more of our cultural heritage becomes created, documented, shared, or embedded via digital means, the imperative to collect digital artifacts only increases. By providing and preserving these collections, we allow for their future use, study, and examination to help investigate the world, society, culture, and change.
The Internet Archive’s work includes a robust acquisitions program as well. In a conversation with Library Journal, Open Libraries Director Chris Freeland explained that Open Library “accepts donated books and other materials from libraries, digitizes this content in its scanning centers, makes those materials available to the public via controlled lending at openlibrary.org, and provides a set of the digitized files to the donor library.” This unique program provides libraries with much needed digitization support free of charge, while also allowing for libraries to open their collections to a wider patron community. 48 library partners have joined the program, including MIT Libraries, Georgetown University Libraries, and Stanford Law Libraries.
Despite the Internet Archive’s history as a pioneer in digital collections and archiving, the organization now faces a lawsuit over its controlled digital lending (CDL) program in relation to Open Library and National Emergency Library (NEL). Launched amid the early stages of the Coronavirus pandemic and now closed, the NEL aimed to provide expanded access to digitized versions of print material to support online teaching and learning while many school and public library print collections became out of reach. Unlike HathiTrust’s Emergency Temporary Access Service, which only provided digital surrogates to the libraries owning the original print edition, the NEL’s approach opened access to a much wider audience without restriction.
The NEL allowed for the most liberal interpretation of controlled digital lending, which has been a powerful access tool at an increasing number of research libraries in recent years. CDL allows for a 1:1 ratio of digitized lending of print books, where the library may circulate a digital copy as long as it makes the print edition unavailable for circulation. CDL’s supporters explain that the practice falls within fair use, and resolves the “20th century book problem” in which “many 20th century books are not available for purchase as new copies in print or as digital versions online”. David R. Hansen and Kyle K. Courtney outline their rationale clearly:
Our principal legal argument for controlled digital lending is that fair use— an “equitable rule of reason”—permits libraries to do online what they have always done with physical collections under the first sale doctrine: lend books.
While CDL opens up print collections to new possibilities for meeting the needs of distance learners and responding to crises in which the print collection becomes unavailable, it does have its critics. The Author’s Guild has written several statements in opposition to the practice, and argues that libraries do not have the right to convert print material to digital formats without the approval and written consent of the author or copyright holder. They also express grave concerns over author compensation, especially during the COVID-19 pandemic when many authors face cancelled book tours and related public engagements.
The complaint against the Internet Archive was filed June 1, 2020 by Hachette Book Group, HarperCollins, John Wiley & Sons, and Penguin Random House. Not only does the complaint directly target the practice of CDL, but also accuses the Internet Archive of falsely posing as a library in order to expand its reach and influence, and further infringe on established copyright law. Referring to the Open Library and National Emergency Library as a “scheme”, the complaint claims that the Internet Archive and its services “grossly exceed legitimate library services, do violence to the Copyright Act, and constitute willful digital piracy on an industrial scale.”
Many organizations and institutions support the Internet Archive’s legitimacy and denounce any suggestion that the Internet Archive is not, in fact, a real library. SPARC, the Scholarly Publishing and Academic Resources Coalition, released a statement on June 29th in support of the Internet Archive, and provides extensive resources on the use of CDL in research libraries. In a guest post for the Center for Democracy and Technology, Avisha Sabaghian highlights what makes the lawsuit so troubling:
Many libraries support and use CDL—not to replace publishers’ controversially restrictive and prohibitively expensive e-book licensing agreements, but to supplement the current e-book market’s deficiencies. As outraged voices position the Archive as inimical to the interests of libraries and authors, some fear that CDL, a framework so enormously empowering for 21st-century libraries, will fail to ever realize its true potential.
New strategies for library advocacy in support of creative collection practices like CDL can help address the market deficiencies outlined above. It is imperative that the Internet Archive be viewed as a legitimate library and archive, and as a leader in digital archiving, collections, preservation, and access. Put simply: there is no other comparable institution. The accusations against the Archive put far more at risk than CDL; its 45+ Petabytes of server space preserve a record, and a back-up copy, of our collective digital footprint. SPARC links to a form which allows individuals and organizations to sign the position statement in support of CDL, and the Internet Archive routinely shares how temporary access to their digitized collections helped educators through the swift adjustment to online teaching. The availability and popularity of Open Access and Open Education Resources also offer promising new directions for both research and library collections.
Interested in learning more about CDL, open digital collections, and some of the programs mentioned here? Please check out the links below:
- View Open Libraries collections at the Internet Archive
- Read about the National Emergency Library
- Read about Columbia's ETAS participation
- Watch Recorder: The Marion Stokes Project streaming for free at PBS
- Read about the development of The Whole Earth Catalog Archive
- Read TC's open access journal Current Issues in Comparative Education
- Read the white paper on controlled digital lending
- Read the complaint against the Internet Archive
- Participate in TC’s COVID-19 Community Archive
- Run a sample search for Open Access materials