Conclusion

Constantly Averting the Digital Dark Age

The digital dark age was largely averted by 2001, at least in terms of the most apocalyptic predictions made only a half-decade earlier. The widespread destruction and loss of digital heritage would not happen. With the establishment of the Internet Archive and national library programs, the big question was how sustainable their holdings would be over the long term. This was like the dilemma facing the September 11 Digital Archive. As the web and its archives aged, would its memory endure? Without active investment, digital content disappears. Furthermore, could collecting institutions continue to innovate and rise to the challenge of new trends and technology? The web constantly changes, from new platforms to dynamically generated content. A 2002 web crawler could not capture the dynamic web of 2012, let alone 2022 or today. Perpetuity is a long time. Perhaps it is hubris to imagine that the digital dark age has been averted? Perhaps traditional archiving is an impossibility in the “age of algorithms,” with personalized experiences instead requiring that we document user experiences.1

Fortunately, the field continues to rapidly evolve. Institutions have not rested on their laurels. In 2002, cognizant of the value of having more copies around the world to steward the long-term health of the Internet Archive, as well as given its symbolic geographical value, the new Bibliotheca Alexandrina in Egypt acquired a copy of the Internet Archive’s holdings.2 The first Library of Alexandria had aimed to be a universal library. How better to help create the new one than by donating a comprehensive web archive? National library web archiving continued to expand around the world as the technology became more accessible and commonplace. While still a field dominated by the Global North, it is slowly expanding beyond its Western roots. In 2004, the National Library of Korea began the Online Archiving & Searching Internet Sources (OASIS) project, creating its own national web archive. Then, in 2010, the National Diet Library of Japan was given the authority to archive public agency websites, and in 2019 the National Library Board of Singapore began comprehensively crawling its web domain as well.3 In the aftermath of the 2016 American federal election, as Donald J. Trump assumed the presidency, web archivists and allied activists around the world acted to ensure the preservation of vital American government data. Much of the focus was on climate change data, but the importance of documenting the transition became global news.4 Other events, from the global pandemic to wars to subsequent terrorist attacks and protests against racism and police brutality, continue to underscore the need for web collecting. Eternal vigilance is key. Ephemeral content shapes our world. Collecting is a constant task, carried out by vigilant professionals.

The Digital Dark Age Never Dies

Even in this, the age of widespread preservation, the idea of a digital dark age continues to haunt us. It is a concept with surprising longevity. Perhaps this is because the phrase takes a complicated problem and evocatively summarizes it. In 2003, a syndicated wire service story again raised the fear that even if an adult reader could look at traditional, yellowing photos of their childhood, their “grandkids probably won’t fare as well with your digital photos. The computer files may survive but the equipment to make sense of them might not. This era could become a ‘digital dark age,’ a part of its collective memories forever lost.”5 This framing now ignored the work that went into these problems. Tropes and themes from the mid-1990s are continually reborn. The digital dark age endures.

This is frustrating. Save apocalyptic collapse, the digital dark age has been averted. Anything that would lead to a digital dark age would presumably lead to a nondigital dark age as well. In such a case, perhaps we would instead be looking to A Canticle for Leibowitz for inspiration. The stewardship of digital documents is a solved problem insofar as we understand the need to shepherd information (not to say that new formats, platforms, or copyright restrictions do not cause trouble). If the Library of Congress ceases to exist, the digital information it holds may disappear as well—but so would its other holdings. We should not discount the need for an organization like the Long Now Foundation, but we also should not irrationally fear that our collective digital heritage can disappear in the blink of an eye.

Fears of the digital dark age continue unabated, seemingly disconnected from the developing digital preservation field. In 2003, a letter published by Nature raised the fears of a digital dark age of email, worrying about the “potential for the loss of records that may have immense historical value.”6 In 2009, the metaphor shifted somewhat with chilly fears of a “Digital Ice Age” (“we may find our files frozen in forgotten formats”).7 The next year saw Kurt D. Bollacker advising in American Scientist around how society could avoid a digital dark age so that we could ensure the preservation of “the record that future generations might use to remember and understand us.”8 Indeed, well over a decade since the original coining of the term, archeologist Stuart Jeffrey raised the specter of a “new” Digital Dark Age in 2012, as his colleagues began to use commercial tools in an “open, dynamic, and fluid” context that perhaps eluded traditional preservation approaches.9 Jeffrey’s point was an important one. Digital preservation requires continual engagement and investment—which it is indeed seeing.

Most vividly, in 2015, Vint Cerf—then a Google vice president—declared at the American Association for the Advancement of Science annual meeting that without action to avert digital loss humanity faced a “forgotten generation, or even a forgotten century.”10 Cerf’s intervention that year probably gave the idea of a digital dark age more attention than it had ever had. Stories on the topic appeared in the Atlantic, the BBC, and media outlets around the world. Scholars and practitioners in the field were disappointed. Michael Nelson, a computer science professor at Old Dominion University and a leading web archives researcher, likened the media frenzy around Cerf and the digital dark age as akin to “having your favorite uncle forget your birthday, mostly because Cerf’s talk seemed to ignore the last 20 or so years of work in preservation.”11 Of course, Cerf had been part of the first generation of agitators involved in raising the need for web archiving and digital preservation more generally. Yet even if Cerf had overlooked some work that had been done, perhaps it suggested that the memory community needed to better articulate its value and activities. In some ways, it was more a problem of publicity and awareness than that of preservation. It also speaks to the lack of a public understanding that the web remembers, which has profound consequences for those who unwittingly forget about institutions such as the Internet Archive. Many a politician would have avoided controversy had they known that the Internet Archive existed.

While Cerf became increasingly involved with the International Internet Preservation Consortium and modern web preservation, the idea of a digital dark age persisted. Another Google employee, Rick West, raised fears of a digital dark age in 2018, in terms that echoed the prospects raised almost fifteen years earlier. West noted that society “may [one day] know less about the early 21st century than we do about the early 20th century . . . The early 20th century is still largely based on things like paper and film formats that are still accessible to a large extent; whereas, much of what we’re doing now—the things we’re putting into the cloud, our digital content—is born digital.”12 The persistence of these tropes suggests to some degree an appetite from the worried public or journalists to explore information’s ephemerality. It may also reflect the growing role of digital loss in our own lives. Many of the earliest digital preservationists were spurred into action by their own experiences. How many of us now face digital instability in our daily lives? Attention is not bad. By worrying about a digital dark age, we avoid it.

Some of this anxiety is well placed, especially when specifically aimed at the problem of preserving material created or posted on third-party platforms such as Facebook and Instagram. As private user-generated information is locked behind passwords and beyond the reach of web archivists, the data people put on Facebook is at Facebook’s mercy. This raises genuine concerns around a renewed digital dark age: not file formats or lost websites per se, but rather how our memories are deposited on private platforms. As Adam Shepherd evocatively asked in 2019, “How many of the photos and videos that you’ve shared on Facebook and Instagram do you have copies of in other places?”13 My personal website will be preserved by the Internet Archive, and our national library in Canada will have copies of news sites and government pages that provide broader context to my life, but most of my personal photos and thoughts live on Google Photos and Twitter (now, sigh, known as X). If those sites were to shut down (and one day they will), would my earliest photos and tweets about my children disappear too?

The specter of a digital dark age thus forces us to continually think about the longevity of digital information. If your Facebook account was hacked, or if you died, or if you wanted to ensure somebody had access to it in the medium-term future, would your information be safe or accessible? If not, what steps could you take to do so if you wanted it to be?

Will web archiving become a victim of its own success? The institutionalization and routinization of web archives runs the risk that they are taken for granted. Seemingly reliable operational initiatives do not always get the attention that they deserve. While web archiving today is a core library function, for most institutions it is minimally staffed. Web archiving is done on the side of desks, with few dedicated personnel. In the United States, for example, in 2017 four out of every five web archiving institutions employed less than one person to do this task.14 This is a lot of critical work falling on few shoulders. The ease of web archiving has not been matched by investments to improve the overall capacity of web archiving.

The COVID-19 pandemic brought the implications of this underresourcing into relief. In the first few weeks of the crisis, web archivists moved into action, documenting spreading lockdowns across much of the Western world, preserving images of empty streets, and social media conversations that debated public health interventions. Yet as weeks turned into months, archivists began to feel what was described as “curatorial fatigue.”15 Without sufficient support, individuals were overwhelmed by the importance of the task layered on top of other work responsibilities and personal challenges. In the wake of the 11 September 2001 terrorist attacks, teams of curators worked together to document events and establish portals. There was a spirit of innovation and of breaking new ground. Now, with web archiving an established yet peripheral function, too many librarians and archivists found themselves overwhelmed. In other words, a tall order has been set for many of our web archivists today: no less than deciding the fate of much of our historical record. A lot of weight is put on their shoulders. Algorithms alone will not avert the digital dark age. People will.

The Right to Be Forgotten and the Pitfalls of Averting
the Digital Dark Age

When the Internet Archive first became known, many observers were alarmed. What about the privacy implications? In July 1996, David Berreby encouraged his Slate readers to consider the “most embarrassing e-mail you ever wrote, available to anyone curious enough to go looking . . . As we’re encouraged to exult over the vast new volumes of information that are becoming easier and easier to capture, remember that the art of losing is also important to master.”16 Dan Gillmor similarly worried in September 1996 about the prospect of “every dumb thing [that] I’ve said on-line” being saved.17 A year later in March 1997, John Markoff worried that the bigger the web archive, the bigger the Big Brother problem would be.18 When the Wayback Machine launched in 2001, it provided relatively complete access to the Internet Archive collection. However, as a user needed a URL until the recent advent of limited keyword search, the Wayback Machine provided privacy through obscurity. The apocalyptic visions of 1996 did not come to pass. Both public and private people alike have been affected by the Internet Archive, and things may have been remembered that they wish were not.

Yet by 2010, there was also increasing concern about the impact of old, decontextualized information. Viktor Mayer-Schönberger’s 2009 Delete traced in part the collision between Web 2.0 (by 2001, as he notes, “users began realizing that the Internet wasn’t just a network to receive information, but one where you could produce and share information with your peers”)19 and the accessibility of web-based personal information on the web. At times Mayer-Schönberger overstates his points—the argument that “forgetting has become costly and difficult, while remembering is inexpensive and easy” is not true.20 However, his argument that “comprehensive digital memory represents an even more pernicious version of the digital panopticon. As much of what we say and do is stored and accessible through digital memory, our words and deeds may be judged not only by our present peers, but also by all our future ones” is profound.21 Reactions to this emerging trend have taken different shapes around the world.

The “right to be forgotten,” as a legal precedent, stemmed in part from a Court of Justice of the European Union ruling in the 2014 Google Spain decision. The right drew on deeper histories, concepts, and precedents. The case involved a Spanish man who was unhappy that a Google search for his name surfaced details of a government auction of his property. He felt this resolved situation was no longer relevant and impugned his reputation.22 The Court of Justice ruled in the man’s favor, holding that Google had a responsibility to balance its rights with those of users, giving the plaintiff the right of erasure.

This “right to be forgotten” influenced the European Union’s subsequent 2018 General Data Protection Regulation. GDPR enshrined the right of individuals to request that their data be erased for specific reasons (which then needed to be balanced against free speech rights). As of 2020, approximately 45,000 requests were lodged with regulators to have information delisted from search engines in Europe, of which just under half (43%) were deemed to be valid.23 Reflecting different legal and cultural traditions, many North American commentators reacted with surprise to these decisions and regulations. Would the “right to be forgotten” not limit freedom of expression?24 The leaders of the Wikimedia Foundation (which oversees Wikipedia) worried about these rulings creating the prospect of “an internet riddled with memory holes—places where inconvenient information simply disappears.”25 Would all the hard work of web archiving be undone if inconvenient documents could disappear through legal fiat?

Legal scholars Melanie Dulong de Rosnay and Andres Guadamuz argued that ultimately the legal requirements would have little to no impact on web archives, given the already existing policies and approaches to privacy.26 The Oakland Archive Policy already implicitly gave most people a right to be forgotten. Conversely, as much of the right to be forgotten involves delisting content, the combination of the lack of full-text search in most web archives as well as the relegation of national library legal deposit collections to on-site only access had already established a balance between privacy and access. As one does not need to delete content to conform with the right, only to make it less accessible, web archives were compliant.

Improving computational access may unsettle this balance. Some of the original ideas around access, such as treating archives as akin to census data, had tried to strike a balance between privacy and access. Since the Wayback Machine’s launch, most models have implicitly adopted a privacy-by-obscurity approach. In the case of national libraries, some have been required to restrict access to on-site reading rooms. On-site access, which often strikes observers as ludicrous—forcing a researcher to physically travel to sit at a computer to view networked resources—is a reasonable compromise between access and privacy. Sitting in a reading room on a special terminal, few readers would mistake archived webpages for live content. Context collapse can be avoided. Parallels can be drawn with the digitization of nondigital archival sources, which also raise ethical access questions.27 Giving access to researchers in a reading room is different from unfettered decontextualized access to anybody with a Google search query, even if both collections are, in theory, open. Conscious of the burden that travel places on many researchers, particularly those with limited travel funding or caregiver responsibilities, perhaps a similar outcome could be achieved through a Virtual Private Network (VPN). Some degree of friction is not a bad thing.

One fear of unfettered digital access is the prospect of an archived website or social media post being consumed out of context and used to shame somebody. For example, in my earlier archival work on 1960s student activists, I encountered documents that raised questions about how my reaction might have been different had these activists tweeted or even had all of their material digitized rather than having letters that ended up in archival boxes.28 One letter, from the McMaster University Archives and Special Collections, contained 1962 musings by a student activist in their late 20s who thought it would be “fantastic news” if nuclear-tipped Bomarc missiles (which were to be used over Canada in an anti-­aircraft defense strategy) could be redirected by the Soviet Union after launch to turn around and destroy their launch bases. This letter helped me understand the context of the New Left, presenting a vivid example of how the New Left was very different from the mainstream Canadian social democratic left. Yet the thought process of that letter in isolation is horrifying on the surface. It effectively mused about the deaths of Canadian military personnel and those nearby. But as part of a broader collection of correspondence, it illustrated the ways in which New Leftist intellectual thought was developing.

As a researcher sitting in an archive, I found the letter to be unremarkable. Yet if a keyword search for the author would surface it, readers might be less charitable sans context. Suddenly the need to travel to another city to visit an archive or log into a cumbersome VPN is less of an inconvenience but perhaps part of thoughtful, historical research. What if that letter had been a tweet or a blog post in a web archive? A student writing that today could become the target of an outrage campaign.

In 2016, my colleague Nick Ruest and I were bullish on the prospects on what social media meant for future historical research. “Consider what the scale of this dataset means,” we wrote. “Social and cultural historians will have access to the thoughts, behaviours, and activities of everyday people, the sorts of which are not generally preserved in the record.”29 I still stand behind this. Historians, professionally trained and responsible ones, conscientiously grappling with the past’s complexity, will try to understand the proper context of archival documents. When I discovered the 1962 Bomarc missile letter, I did not turn to a conservative Canadian media outlet, claiming to possess the smoking gun of the New Left’s moral degeneracy. Context matters: private correspondence, part of an ongoing debate around the adoption of nuclear weapons in Canada, by a young student exploring new ideas in a climate of intellectual exploration. I suspect few professional historians would think differently.

On balance, I believe the value of a society-wide “right to be remembered” outweighs in general the value of an individual’s right to be forgotten. This needs to be considered in a context of archival ethics and care. The compromise position of complicated on-site or remote access, requiring researcher registration, compels researchers to view and think about documents in context. By doing so, readers know that web archives are not just websites like any other on the live web. Furthermore, researchers cannot easily share the “gotcha” moment, flattening time as if a blog post from 2001 is the equivalent of one in 2021.

Historical processes unfold by virtue of human choices and decisions. The current situation when it comes to digital access was not inevitable. The international library community has generally taken a conscious choice toward expansive selection. In the case of the Internet Archive, it ultimately adopted relatively open access policies, but it handled privacy through obscurity. As institutions develop next-generation search and retrieval systems, the lessons of the past serve as concrete inspiration about alternate visions of web archival access.

The Little Digital Dark Age, 1991–1996

The direst predictions of a digital dark age were averted by the beginning of active web preservation in 1996. Yet there was a short dark age after all, between the advent of the web in 1991 and the development of memory institutions in 1996. In some ways, this ultimately formed the dreaded “digital gap” that commentators such as Danny Hillis had worried about. Seeing what we have lost can help us gain a better appreciation of what has been saved.

Not everything published on the web before late 1996 has been lost. Magazine articles, newspaper accounts, oral interviews, journal articles all help to reconstruct the earliest web. Digital forensics can as well with considerable effort. Tim Berners-Lee’s first website, for example, had been launched in December 1990 at CERN. By the time web archiving began in 1996, it had been converted into a museum site. An effort to reconstruct it drew not only on preserved files but also an array of other contemporary sources to recreate the browser experience as it might have been in 1990.30 Similarly, the first American webpage, the 1991 homepage of the SLAC National Accelerator Laboratory was restored from backup. Researchers needed to undertake a fairly involved process to convert an “original list of scattered files into an accessible and browsable website.”31 Apart from these rare exceptions, however, most of the actual websites from this period are lost. The labor needed to reconstruct sites like this does not scale. We can reconstruct a few significant sites with great effort and the fortune of either backups or contemporary documentation. Most sites are beyond this.

What have we lost? We can look to events that took place in 1995 and early 1996 to understand the gap. Before 1995, few corporations or people were on the web. Accordingly, 1995 to 1996 was the earliest period when the web was part of many people’s lives.

Political examples are perhaps the most obvious. Historians can look to significant events to then in turn see what remains. In October 1995, the Canadian province of Quebec held a referendum on whether the province should pursue sovereignty. Digital historian Ryan Deschamps conducted preliminary research into what a digital history of 1990s Canada would look like. He found that despite evidence from print media and Usenet discussion boards that both the pro-­independence and pro-separatist sides had important web presences, they were largely not preserved. Given the importance of the Quebec referendum to an understanding of modern Canadian history, this is a major loss. As Deschamps notes, “it was not clear to people in 1995 that web pages were historical documents that people might want to use for research in 2017 [or beyond].”32 While the Bibliothèque et Archives nationales du Québec does have webpages from 1995, the sovereignty campaign pages were lost.33

It is distressing to think of the material that was lost from the early web that we do not know about. Early cultural sites? Homepages? Jokes? Academic servers? Wikipedia’s “List of websites founded before 1995” is an overview of some of the kinds of early websites. As a crowd-sourced document, it is well positioned to draw on the collective memory of early web users.34 Science museums, web comics, religious movements, campus newsletters, business websites, all feature on this list. The Exploratorium science museum in San Francisco, for example, opened its website in 1992, but the earliest snapshot we can access today dates from January 1997.35 The Economist’s first website, launched in March 1994 by one of its correspondents, was reconfigured after eighteen months. As an Economist author bemoaned, “[a]ll records of the original website were subsequently lost. So much for the idea that the internet never forgets. It does.”36 Websites that lasted into 1996 or 1997 may have been preserved. Those that did not last that long, or were small enough that they eluded detection by web archives in their smaller early years, have not. If it was not for the people and institutions discussed in this book, we would have more stories like this. The gap would have been a lot longer. Ultimately, the Little Digital Dark Age of 1991–1996 illustrates how fortunate we are today.

Web archiving and digital preservation has come a long way between the early 1990s and 2001 and continues to develop today. The implications of this are still making themselves clear. As historians move into the web age of history, it will be important for them to know how the archives they use have been constructed and how they came into being. None of the web’s memory is natural or intrinsic to the platform itself. It has been painstakingly constructed, the product of countless conscious decisions across Silicon Valley, national capitals, and academic institutions. Today, the Internet Archive is increasingly part of the internet’s core infrastructure. As research libraries and national institutions preserve swaths of our cultural digital heritage online, it is critical to remember that none of this was inevitable. The web naturally forgets, and it is up to us to help it remember.

Share