Introduction

In early 1996, the web was ephemeral. By 2001, the web remembered. Today, most of us know that when we share something on the web—a tweet, a blog post, an article, a photo—it can last forever. Yet longevity is not an innate feature of the web. The development of the web’s memory was not accidental, nor was it the product of a coordinated master plan. What began as worries amongst research librarians, technologists, futurists, and writers from the Second World War onwards laid the groundwork for today’s digital memory. If, in the mid-1990s, commentators worried about a “digital dark age,” we are now in an age of historical abundance. Memory institutions preserve petabytes of information every year, much of it generated by ordinary people as they go about their everyday lives.

The specter of a “digital dark age” haunted libraries by the mid-1990s, portending a dark future with no memory. “The digital medium is replacing paper in a dramatic record-keeping revolution,” warned Jeff Rothenberg in a January 1995 Scientific American article, adding that “such documents may be lost unless we act now.”1 Fears only escalated over the coming years. The combination of more data than ever before being stored, with the need to preserve it over a longer time frame than usual, was, as information scholar Margaret Hedstrom put it apocalyptically in 1997, “a time bomb.”2 Terry Kuny wrote in 1997 at the International Federation of Library Associations conference that to be digital meant “being ephemeral.” Kuny predicted that “it will likely fall to librarians and archivists, the monastic orders of the future, to ensure that something of the heady days of our ‘digital revolution’ remains for future generations.”3 Stewart Brand, former publisher of the Whole Earth Catalog and later cofounder of the Long Now Foundation, noted in 1998 that while “we can read the technical correspondence from Galileo . . . we have no way of finding the technical correspondence” of the digital age.4 To these thinkers and leaders, the digital turn portended our cultural record’s destruction.

These commentators articulated fears about the ephemerality of digital information, popularizing conversations that had been percolating around the library and information community for decades. Information has always been fragile, but by the 1960s the prospect of electronic or machine-readable records had increasingly complicated the preservation landscape. Fears of loss accelerated in 1994 with the launch of the joint Commission on Preservation and Access and the Research Libraries Group’s Task Force on Archiving of Digital Information, raising the profile of the newly coined “digital preservation” field among research libraries and archives. Networked communication, especially with the rapidly growing World Wide Web, further raised the stakes. The web was part of a much broader context of pressures stemming from changes in digital storage formats, which in a matter of years had seen a transition from large floppy disks to smaller diskettes to CD-ROMs. As personal computing became more accessible by the 1990s, users could see information that had been accessible only a few years ago become lost in a sea of obsolete technology or file formats. The growing number of computer and then web users made the prospect of a digital dark age less science fiction and more a reality.

By the late 1990s, the web’s ephemerality became a major challenge as thousands raced to join the “information superhighway.” Where there had been under 2,500 websites in 1994, by 1996 there were over a quarter of a million and by 1997 over a million, with the number exponentially growing.5 Experts and users alike came to see that important websites could suddenly disappear: a server taken offline, a student graduating from college and losing their account, fees going unpaid, and the site—poof!—would be gone forever. The ephemerality of web content was clear. By the mid-1990s, this was affecting more than just a few disappointed geeks. Commentators and technologists saw this as the potential collective imperilment of the human record. A digital dark age would be a calamity. Would the web be ephemeral? Out of need came innovation, represented in part by the 1996 founding of the Internet Archive.

Flash forward a half-decade to the morning of 11 September 2001 in the United States. Hours after hijacked airplanes crashed into the World Trade Center, the Pentagon, and rural Pennsylvania, memory infrastructure sprang into action. The Internet Archive, Library of Congress, and memory institutions around the world acted to preserve thousands of websites, tens of thousands of emails, and other digital artifacts relating to the attacks and their impact. Existing digital infrastructure at these diverse institutions operated to ensure that by the end of the day itself, historians would have hundreds of website snapshots to draw upon to understand that tumultuous day and its aftermath. Only a month after the attacks, on 11 October, a web portal was launched to provide immediate access to a replay interface that let users go “back in time” to websites collected on the day of the attacks. Scholars soon thereafter launched a crowdsourced platform to gather digital information, including digital voicemail recordings, email listservs, and digital photographs. While the record is not complete—commentators bemoaned the loss of rich digital exhibits lost to the obsolescence of Adobe Flash during the twentieth anniversary coverage—it is as comprehensive a rec­ord of any major historical event of its magnitude that we have.6 Rather than serving as evidence of a digital dark age, the terrible events of September 2001 suggested that a golden age of memory had dawned.

This formative period for web archiving remains relevant to today’s web archiving and digital preservation landscape. Between 1995 and 2001, the field witnessed the emergence of the Internet Archive, national library collection programs in Europe and North America, new international coordinating bodies, and a cultural consensus within the library and archives field that this work was both central and necessary. While this book is very much a story of developments in the Global North, primarily due to the high costs of these digital preservation programs and their accompanying infrastructure needs, the programs implemented in a small number of affluent countries came to have a global impact. Many of these programs were cemented by late 2001. If there had been any doubt, the collaborative mechanisms put in place after the 11 September 2001 attacks solidified policies and procedures for the future. The core programs that today form the foundation of the global web archiving landscape owe their origin stories to these critical years.

If there was a digital dark age for the 11 September attacks, it is measured in hours rather than days, months, or years. In 1996, our digital heritage was fragile and ephemeral. A digital dark age loomed. Five years later, a cadre of professionals moved into action within hours, actively preserving events as they happened. Between 1996 and 2001, dedicated information professionals—at the Internet Archive, libraries, and elsewhere—averted the digital dark age.

Dark Age to Golden Age: The Preservation Story

Averting the Digital Dark Age explores this shift from fears of digital loss to our current state of abundance. It does so by examining the intellectual ferment between the web’s 1991 birth and the coming of age of web archiving programs in 2001, represented by both the 11 September attacks as well as the launch of the Wayback Machine portal shortly afterwards. While in some cases the book looks forward a few years later to see the culmination of some processes, particularly around legal deposit and copyright, in the main it focuses on this critical period. In other words, if in 1996 and 1997 the affluent among Western society confronted the specter of a digital dark age, only four years later we had entered a period where we had the potential for a golden age of robust historical records. This was a rapid shift. Why does the web remember today? And what can we learn about this transition as we look ahead to future medium shifts?

Today, we grapple with the opposite problem of a digital dark age. What are the implications of all the remembering we do? Should people be beholden to comments they made ten years earlier, especially if made when a child? Do they even realize that the web remembers? Should data be allowed “to die”?7 The web today is a mixture of fragility (the ever-present 404 error signaling that a page is missing) and permanence (the ability to pull up a deleted item from the hundreds of billions of websites preserved by the Internet Archive). The web does not have a built-in memory system, but libraries and other memory institutions around the world fill that function. What is collected and what is not—“archival silences”—are shaped by the historical processes that gave rise to this preservation infrastructure.

This unsettled situation has its roots in a series of decisions stretching back decades. On the one hand, the web inadvertently ended up as a fragile, ephemeral platform. The decisions that allowed it to scale rapidly across the world to become a “world” wide web also meant that links broke, servers disappeared, and a failure to pay a domain rental fee felled many a website. Realizing that, memory professionals pragmatically and presciently developed memory systems. As they were not integral to the web, many web users today (and over the last two decades) have been caught off guard by the limitations as well as the scope of digital memory.

In adopting the moniker of a “digital dark age,” commentators and pundits were drawing on a long historical tradition of using this evocative understanding of the past. The framing of a digital dark age itself draws on an apocryphal understanding of the past. The period between the end of the Western Roman Empire and the Renaissance was erroneously understood in the past as a period of “darkness,” a marked discontinuity between the Roman Empire and the early modern period. While the term has fallen out of historiographical use, there is still a popular understanding of the medieval era as a dark, superstitious, violent time (one rarely uses the adjective “medieval” as a compliment). Due to this legacy, the “dark ages” is a useful term for cultural commentators. As Matthew Gabriele and David M. Perry note, “the particular darkness of the Dark Ages suggests emptiness, a blank, almost limitless space into which we can place our modern preoccupations whether positive or negative.”8 For early Renaissance thinkers, the concept of a dark age was useful for drawing a distinction between the darkness of the recent past and the brightness of antiquity, which helpfully denied continuity between the Roman Empire and the Holy Roman Empire.9 The term, and the historical implications stemming from it, have thus been useful to commentators for centuries. In any case, even during the period we dismiss as the Dark Ages, much of central Asia was undergoing an age of enlightenment, a flourishing of science, culture, and philosophy.10

Compounding this, record-keeping practices have always impacted our histories and led to inclusions and exclusions. In some ways, for example, we struggle to understand central Asia’s age of enlightenment due to a lack of transcription and translation of critical manuscripts.11 This period, an obvious counterweight to claims of a dark age of human history, is partially obscured by more recent decisions around records and now digitization. Selectivity around sources also influenced Western European historiography and helped contribute to the dark-age framing. Historian Patrick Geary notes that “what we think we know about the early Middle Ages is largely determined by what people of the early eleventh century wished themselves and their contemporaries to know about the past.”12 The Dark Ages are less a product of their own time—and more the outcome of decisions made by successors about what records to retain. History is often more about the stories that are told about the past and the records that are kept than about the past itself.

Archivists and record keepers are thus critical to the construction of the past and our history. If we have a digital dark age of the 1990s or 2000s in our future, it will owe less to the decisions of record keepers made at the time and more to our failed long-term commitment to stewarding this information over the decades and centuries to come. The real challenge of digital preservation is organizational, rather than changing formats or disks.

Discussions around the idea of a digital dark age, both in the media as well as among librarians, archivists, historians, and policy makers, set the stage for two main approaches to web preservation. The first, the Internet Archive, represented a private, nonprofit, technologist approach to the perpetual preservation of web content. Founded by technology entrepreneur Brewster Kahle in 1996, the Internet Archive responded both to prevailing cultural trends while also drawing on Kahle’s earlier (and unique) experience across the digital libraries and information retrieval communities. Yet for the first decade (or longer) of its existence, many feared that the Internet Archive would disappear into the ether, plunging the web back into a digital dark age. With such an accumulation of information, this would be akin to losing a modern Library of Alexandria.

This private initiative was counterbalanced by a second approach, that of national library web archiving in several affluent countries. States can persist when private organizations fail. As early as 1994, the National Library of Canada piloted web-based archiving, followed in 1996 by large-scale projects carried out by the national libraries of Sweden and Australia. These programs offered the institutional stability that the Internet Archive appeared to lack, albeit at the cost of being less flexible due to government regulation and resource constraints. Together, the Internet Archive and national libraries formed an effective memory system.

These approaches complemented each other when it came to the long-term stewardship of digital material. The Internet Archive innovated, collected, inspired, and took on risk. Make no mistake: collecting the web at scale was risky business. Risk was present at every stage—of lawsuits, of breaking copyright law, of losing priceless and irreplaceable data. The Internet Archive could assume these risks in a way that an institution like the Library of Congress could not because of institutional contexts, bureaucracies, and a different risk appetite. Indeed, many national libraries continue to struggle today with collecting and access, often owing to internal risk calculations. The Internet Archive assumed a great deal of the world’s web archiving risk. Several times, as we will see, legal commentators confidently assumed that they would be sued into oblivion. On the other hand, national libraries offer stability and sustainability. They may not have the innovative energy of an Internet Archive, but they should last longer. In any event, given the potential for state failure, it is good to have digital collections in both public and private hands.

Studying the Internet Archive, national libraries, and individual preservation initiatives in isolation does not do justice to the broad intellectual ferment that underpinned the digital transformation of libraries in the 1990s. To understand the rise of web archiving, it is essential to explore the intellectual conversations of the 1960s onwards, the rise of international networks, and those who brought the problem into public conversation. If in January 1996, the web was ephemeral, merely five years later, in 2001, the picture had become much more complicated. We live in a world today transformed by this shift. The digital records of elections, disasters (natural and human alike), pandemics, childhoods, cultural phenomena, memes, and so forth are all preserved to varying degrees.

Historicizing this question thus helps us to understand the limits of memory for the internet and web. Decisions from decades ago shape our record today. The web “remembers” through the work and policies of people and institutions around the world. Yet it does so imperfectly. Viktor Mayer-Schönberger argued in 2009 that “forgetting has become costly and difficult, while remembering is inexpensive and easy.”13 If this book shows anything, it is that the act of “remembering” the web is neither inexpensive nor easy. It requires continuous investment.

Much of the popular story of this problem and its “solution” revolves around the Internet Archive. Brewster Kahle, a Bay Area technology entrepreneur, who prior to the Internet Archive had codeveloped the Wide Area Information Server (WAIS) architecture, was at the end of a stint at America Online when he began to explore whether to establish an organization to preserve the internet. In early 1996, Kahle founded the nonprofit Internet Archive alongside the for-profit Alexa Internet corporation. That they were founded on the same day speaks to the meshing of commerce and altruism. Alexa would crawl data and use it to develop web navigation tools, whereas the Internet Archive would steward the data in perpetuity as a public good. In this arrangement lay the core of the Internet Archive’s sustainability model. In some ways, the meshing of public interest and commerce can be seen in the Internet Archive’s contemporary Archive-It service, which offers paid subscription services for institutions to carry out web archiving while drawing on Internet Archive infrastructure.

Yet the Internet Archive is not the whole story. In 1994, the National Library of Canada launched the Electronic Publications Pilot Project, or EPPP. This pilot investigated the technical and scholarly implications of harvesting selected scholarly journals and new media publications from the web. At a time when it was by no means assured that the web would be the dominant means to access the internet—competing protocols and platforms included Gopher, Archie, and WAIS—the EPPP laid a foundation for future web archiving. Its activities and 1996 final report influenced and inspired other web archiving projects.

In 1996, the National Library of Australia was motivated in part by the EPPP to begin harvesting websites, with an eye to curating a collection of culturally relevant sites for future research use. That same year, the Swedish Kungliga biblioteket (KB) launched its Kulturarw3 project. Yet the Swedes and Australians took very different approaches. Rather than curating a collection, Kulturarw3 instead sought to find every Swedish website. Arguing that the Canadians and Australians had been too narrow, the Swedes believed that harvesting web-based “ephemera” as well as formal publications would be more cost efficient and sustainable. If the real cost came in the time spent by the selector in choosing which webpages to archive, perhaps just letting the crawler loose on the web would be cheaper in the long run. From these early examples came other web archives, including the Library of Congress’s MINERVA project in 2000. Debates over which approach to adopt spurred intellectual conversations around how to capture the web sustainably and effectively. Directing our gaze back to the debates around national libraries and the much broader intellectual ferment helps to broaden our understanding of why and how the web remembers.

Much of this coalesced in the aftermath of the 11 September 2001 terrorist attacks. Within the first hour, it was clear that the attacks would have profound political, social, and cultural importance. As Americans and others went online to check on one another’s safety, to memorialize, vent, remember, debate, and collectively make sense of the events, national libraries, researchers, and the Internet Archive quickly preserved the events of the day and the months that followed. They created a community-based archive of the attacks and cemented the enduring significance of web archives and digital collecting amongst libraries and the public.

Despite this book’s broad geographic scope, these preservation activities were mostly carried out in the Global North, defined in this book as inclusive of affluent countries that are primarily but not exclusively in the geographic north (for example, Australia is part of the Global North). The web grew throughout the period to become the predominant global network. As Gerard Goggin and Mark McLelland argue in their Routledge Companion to Global Internet Histories, we need to complicate our American and Eurocentric narrative of internet histories to think more broadly about the implications of how the internet was implemented and used in countries and regions around the world.14 Parts of the world beyond the United States and Europe often used different systems in place of the web during much of the period discussed in this book, whether bulletin boards in Taiwan or email lists in Korea, which entailed different approaches to digital preservation. Indeed, the Taiwanese bulletin board system, PTT, has its historical record “curated by peer-appointed moderators . . . who exercise the right to periodically clean up a given board.”15 Such material can be crawled, but unlike the common standard of the web, it requires more bespoke technical approaches.16

Until recently, the infrastructure needed to preserve the web required a level of investment and resources found in few places apart from the private philanthropy and entrepreneurship of Kahle’s Internet Archive or the digital preservation programs of a handful of affluent national libraries. Indeed, even today many national libraries rely on the Internet Archive to provide the core infrastructure of web archiving, finding the specialized staffing and infrastructure costs too onerous for their own operational capacity.

Yet the development of core web archiving infrastructure would allow for web archiving to expand beyond the Global North. Many Asian countries today have large national web archiving programs, as do several South American countries, although web archiving programs in Africa remain rare. Indeed, web archiving awareness and activity remains low across memory professionals even in relatively affluent African countries such as South Africa.17 This movement outside the Global North is a recent one.

The interconnected nature of the web meant that the establishment of web archiving infrastructure in the Global North affected the whole world. As early as 2004, Peter Lor (then national librarian of the National Library of South Africa) and Johannes Britz of the University of Pretoria reflected on this imbalance. “There is little doubt that the national libraries and other responsible institutions in Africa and other regions of the developing world are not yet in a position to harvest and preserve the web sites emanating from their territories,” Lor and Britz observed. Given South-North information flows, the coauthors noted that there were many legal and moral issues to explore. They suggested broad moral guidelines for practitioners to follow when archiving the Global South, including doing no harm, disclosing objectives and anticipated outcomes, focusing on reciprocity and equity, depositing data and publication, and—vexingly—considering the principles of informed consent and confidentiality.18 The ways in which web archiving evolved, however, mean that informed consent is rarely possible. Crawlers cross borders and collect material at a rate that eludes human oversight.

The web grows exponentially, meaning that only a handful of institutions—perhaps a dozen—have the specialized training, expertise, and infrastructure to harvest web content at scale themselves. Web archiving, too, grows ever more complex. Yet the international nature of the web means that crawlers from the Global North routinely harvest material created in the developing world. They do so, however, through algorithms that largely represent Global North perspectives. The ensuing archival collection certainly reflects this bias. South-North information flows continue to shape global information ecosystems. In some ways this marks continuity with earlier national library collecting practices, from both foreign collecting to more reciprocal exchanges of government documents for purposes of safekeeping. The Library of Congress, growing in part out of international scientific collecting driven by the Smithsonian Institution in the mid-nineteenth century, amassed large international collections. Its international mandate would see (and sees today) selectors and curators amassing large collections of documents from across the world.19 There is thus continuity in this, as national libraries today use web crawlers to select material, just as they also use purchasing agents or other selectors to curate physical material. Such activities are motivated by the desire to create global collections, as well as a nod toward the value of distributed preservation. Accordingly, even if much of the scope of this book is limited to the development of this infrastructure in the Global North, the impact continues to be global.

The core contribution of Averting the Digital Dark Age focuses on the period from 1995 to 2001, although there is essential context before and after that period that informs this narrative. Much of the first chapter reaches back decades to understand the broader history of what we today call digital preservation. It was only in 1995, however, that the explosion of user-generated content made the prospect of a digital dark age chilling beyond the narrow scope of corporate and institutional records. Digital preservation escaped the staid world of record and archival management and became a pressing social problem. It is one thing to preserve the records of a Fortune 500 company. Records managers can help there. It is another to ensure that your own website, newsgroup postings, and online relationships and communities have a life beyond that of a webmaster’s whim or ability to pay a bill. The digital dark age and the broad problem of digital preservation became a matter of public importance in 1995.

Few histories have straightforward end dates. As I note in my conclusion, digital preservation requires continual attention if it is to be sustainable. Indeed, digital preservation and web archiving remain active areas of research and interest. Technical practitioners discuss best practices for preserving archived material, have conversations about how to expand the curatorial scope to include the voices of marginalized or otherwise underrepresented people, and continually explore opportunities to improve the capture of dynamic events such as protests or conflicts. Can the story of Averting the Digital Dark Age end in 2001 when the field continues to evolve and flourish? Is it hubris to say that the digital dark age has been averted? The conversations around the preservation of the 11 September 2001 terrorist attacks marked a moment when the social debates (Should we preserve this content?) gave way to technical discussions (How can we ensure the fidelity of what we are capturing?). Yet, conscious that an abrupt 2001 ending would unduly cut short my narrative, I carry some stories forward. There is a vast technical literature on this topic, complemented by only a small body of work from the humanities or social science. This reflects the move away from fundamental philosophical questions about whether we should avert a digital dark age and toward the question of how to do so.

The Scholarly Conversation and Structure of This Book

Averting the Digital Dark Age explores how the web stopped forgetting and came to remember. In doing so, it intersects with robust scholarship on web archives as well as libraries and memory more generally. Scholars have explored the theoretical and applied impacts of contemporary web archives, most notably the pathbreaking work of Niels Brügger.20 There is also a growing literature on the use of these collections, including research case studies, ethical explorations, technical refinements to crawling or analysis, and beyond.21 It is also part of a broader field of digital preservation, most accessibly and thoughtfully found in work by Trevor Owens.22

Averting the Digital Dark Age is not a technical guide. While some technical details will be discussed where appropriate, the book emphasizes the social and organizational infrastructure and apparatus that made web archiving possible at scale. Much of the extant literature on web archiving is technical, and indeed, much of the gray material around web archiving stems from the technical conversations between experts on how to carry out this form of archiving. Throughout this book, however, where technological development played a major role in the programs discussed, it will be appropriately centered.

Given the technical bent of today’s web archiving literature, it is perhaps unsurprising that the history of the Internet Archive and web preservation more generally is reduced to several rote paragraphs in most works (including my own earlier work). In these abbreviated treatments, we generally learn that the Internet Archive was founded in April 1996, with some other national library programs beginning shortly thereafter. Few details and context are provided. In general, the narrative of the digital dark age and its “solution” focuses nearly exclusively on Kahle and the Internet Archive, with perhaps a few nods to a handful of other programs in a sentence or two. This is not to underplay the Internet Archive’s significance. Despite the discomfort in contemporary historiography toward hagiography, web archiving’s history is intertwined with Kahle’s vision and efforts, as enacted by the Internet Archive. Yet a disproportionate focus on the Internet Archive neglects the broader cultural moment that gave rise to sustainable web archives around the world and made the Internet Archive’s success possible.23

Beyond the web archiving field, there is a more general literature on digital memory. There web archiving is often reduced to a short overview. Richard Ovenden, for example, notes that in a hundred years, scholars will look to our digital records as sources. In his words, “there is still time for libraries and archives to take control of these digital bodies of knowledge in the early twenty-first century, to preserve this knowledge from attack, and in so doing, to protect society itself.”24 He understandably worries about the Internet Archive’s sustainability. Ovenden is correct that there is more work to be done around institutional digital preservation and web archiving, but arguably institutions both recognized and “took control” of this problem from the late 1990s onwards. Other works, such as the monumental, edited collection Information: A History, only briefly discuss the Internet Archive (understandable given the volume’s broad mandate): highlighting its importance for preserving information but quickly brushing past its origin story with a few nods toward the Long Now Foundation and the web’s lack of an intrinsic memory function.25

While this book is in conversation with the fields of new media history and media archeology, it is at its core a work of historical scholarship. It aims to provide context for media archeologists, but it is not itself a work of media archeology. Informed in part by my grappling with the history of digital preservation, my emphasis is on the representations found within media objects. I am less interested in the physicality of a hard drive or HTML encoding but rather what the represented data, placed in context, tells us about the broader social world.26 Media archeology is in part a conscious rejection of histories that are a “telling of the histories of technologies from past to present.”27 This point is well taken: few historians today would write Whiggish histories of progress, and I am not doing so either. While this book has much to learn from these theoretical approaches—especially around the pitfalls of avoiding nostalgia and adopting these theories as an “analytical tool,” per Wolfgang Ernst—it is at its core a work of traditional historical scholarship.28

For this project, I am especially indebted to those studies that situate earlier media technologies or conceptions into broader conceptual frameworks. Ideas of hypertextual communication and the augmentation of human memory often harken back to ideas such as Vannevar Bush’s Memex (discussed at length in chapter 1). Wendy Hui Kyong Chun’s work suggests that Bush believed that he could break history’s discontinuity—a process “due to a historical accident, to our inability to adequately consult the human record, to human fallibility.”29 Just as the Internet Archive’s Wayback Machine seeks to fix the web and provide a memory, Bush’s conceptual 1945 Memex aimed to provide an all-encompassing view of human memory through analog microfilm technology. Chun’s attention to terminology and concepts is helpful, making us think about what we mean by putting information into “memory”—usually we recall from memory, not consciously put things into it.

The work of Matthew G. Kirschenbaum, too, underscores the complexity that underlies digital files and objects. By looking both at the big picture as well as the “bitstream” of a file, we can understand the longer sweep of digital preservation. In his 2021 Bitstreams, Kirschenbaum gets into the weeds: what does it mean to study, say, a manuscript that was drafted in Word, commented on in iCloud, laid out in Adobe, with versions stored on USB drives, and with sequential and often obscure file names?30 Any deep dive into the Internet Archive reveals the necessity of grappling with these details, and thinking forthrightly about what this kind of scholarship means.31 Crucially, Kirschenbaum is conversant with the sweep of archival conversations around digital records stretching back into the 1960s. His observation that the “shift from an archives (as a place) or even the archive (as a trope) to archiving as an active and ongoing process” was anticipated by archival theorist Terry Cook and requires a different paradigm around the preservation of knowledge. Archiving was previously seen as passive but rather now “is best understood as a continually active process, requiring ongoing care, attention, and maintenance to ensure that systems remain secure, software stays up to date, connections don’t deteriorate, and bits don’t rot.”32 Indeed, Kirschenbaum’s earlier Mechanisms underscored the challenge of digital preservation as “while massively technical to be sure, are also ultimately—and profoundly—social . . . effective preservation must rest in large measure on the cultivation of new social practices to attend our new media.”33

There is some irony in that memory institutions themselves are often difficult objects of historical study. The archives of libraries and archives are themselves limited. As many institutions confront accession backlogs for external content, their more recent institutional records remain inaccessible. Given the relatively recent period of this book, this is an especially acute problem. Fortunately, libraries and archives are avid creators of gray literature (position papers, pilot projects, task force reports), all of which are critical to understand their thinking. In the case of the Internet Archive, and Kahle specifically, much of his private correspondence and files have been fully scanned. Their candor and comprehensiveness are testament to Kahle’s commitment to open knowledge. These documents are supplemented by a series of interviews I carried out with key individuals from national libraries, universities, and the Internet Archive.34 Margaret Hedstrom also generously provided me with material from her own holdings.

Given the book’s scope, I made difficult choices around which institutions and programs to focus on, generally preferring to look at the early adopters. This means that some now-prominent web archiving programs are rarely discussed. One of the most obvious examples of this is the omission of the development of big national library programs such as those of the British Library, the Bibliothèque nationale de France, or the Danish national library. My book is rather the story of the pioneering institutions that laid the groundwork for these larger programs to subsequently thrive in the 2000s and 2010s.

The book advances its argument through a series of five interconnected chapters, proceeding roughly chronologically. The exact disentangling of the cultural moment in 1996 presented challenges in how to order the chapters. The Internet Archive is the focus of chapter 3 and national libraries in chapter 4, but the two chapters complement each other and could be read in either order. By separating them, I am conscious of drawing too firm a division between their approaches. However, their separate institutional lineages and the crucial role played by Kahle lend themselves to separate yet related treatments.

Accordingly, the book opens with chapter 1, “Why the Web Could Be Saved: From Machine-Readable Records to Digital Preservation.” This chapter provides the context behind the preservation of a society’s memory in archives and libraries, underscoring the importance of digital preservation. It then discusses the long sweep of attempts to organize the world’s information, from 1890s global catalogs to the work of Vannevar Bush to the web itself. Finally, echoing many other voices in this field, the chapter argues that we need to understand digital preservation as an organizational rather than a technical challenge. All these disparate forces and factors came together with the web and its preservation.

With the need for web archiving established, chapter 2 is “From Dark Age to Golden Age? The Digital Preservation Moment.” It explores the rise of the idea of a digital dark age and the overall shift toward changing our understanding of electronic records and their implications. It does so by following a series of individuals and organizations including information scholars such as Margaret Hedstrom, Microsoft executive Nathan Myhrvold, science fiction author Bruce Sterling, and the Long Now Foundation, who all helped make the idea of digital preservation more accessible to a general audience. In only a few years, the importance of preserving digital information became accepted within information professions as well as cultural conversations more broadly. If in 1991 the web was ephemeral, by 1997 it faced the prospect of beginning to remember. The next step was to build the necessary infrastructure to make these visions possible.

Chapter 3, “Building the Universal Library: The Internet Archive,” pivots to explore the Internet Archive’s origins, as well as the background of its founder Brewster Kahle. It explores the many factors that gave rise to the institution, as well as exploring how a small startup could grow to define and create the international web archiving landscape. Ultimately this was possible because of the intersection of the cultural forces discussed in chapter 2. They had provided an audience and a foundation of a ready audience convinced of the importance of preservation, transforming what might have been a hobby project into one that could shape the global digital landscape.

Chapter 4, “From Selective to Comprehensive: National Libraries and Early Web Preservation,” explores how national libraries adapted their policies and approaches to the digital age. It looks at Canada, Sweden, Australia, and the United States to explore how the earliest web archiving operations began as well as how they debated amongst themselves as to whether a selective or comprehensive approach would be most effective. These were debates about whether the mass collection of information would lead to large, low-quality, and ultimately less useful, collections as well as what it would mean for the role of a web archivist as a creator.

Finally, chapter 5, “Archiving Disaster: The Case of 11 September 2001,” brings together the currents discussed throughout this book to see how the web archiving community—both the Internet Archive and national libraries—responded to the consequential events of that day. In the immediate hours that followed the terrorist attacks across the eastern seaboard of the United States, it was apparent that the events would represent a historical political, social, and cultural moment. Archivists and librarians moved into action, creating a robust historical record, complementing it with all sorts of digital collecting activities. The digital dark age of 11 September 2001 would be measured not in months but hours. I then conclude the book with reflections on how historical lessons can inform contemporary discussions around privacy and the right to be forgotten today. What have we gained and lost?

Much of this book was researched and written during the COVID-19 pandemic. For many white-collar workers, one enduring image of the pandemic might be a screen: a place to work, connect with extended families, nourish friendships, and beyond. Future generations trying to make sense of COVID-19 will do much of this through electronic records, from government dashboards, social media accounts, and disinformation around the virus and vaccines alike. In their research, they will be the unwitting beneficiaries of the legacies discussed in this book. The web now remembers, for better or for worse. By looking back at how the first professionals and activists responded to the rise of the web and the accessibility of the internet, we can better document and preserve our world today as well.

Share