2 From Dark Age to Golden Age? The Digital Preservation Moment

Milligan, Ian

Averting the Digital Dark Age: How Archivists, Librarians, and Technologists Built the Web a Memory

CHAPTER TWO

From Dark Age to Golden Age?

The Digital Preservation Moment

By the mid-1990s, it was clear to many librarians, archivists, and technologists that the web was both fragile and ephemeral. But what to do about it? The early conversation on digital records, initially centered on corporations and institutions, took on increasing urgency when the records of everyday people were considered and the broad impact of a digital dark age was apparent. While we have seen how conversations around electronic records surfaced by the 1960s and 1970s, by the 1990s what had been a debate and discussion largely happening between record managers and within the archival profession emerged into public consciousness through the frame of a digital dark age. This concept made concerns around digital obsolescence seem like a problem that was not just for the Fortune 500 and governments but all of society. Ideas spread outwards from academic venues and fora such as research libraries and the 1994–1996 Task Force on Archiving of Digital Information to have broader social and cultural impact.

Between 1995 and 1998, a series of individuals—including science fiction author Bruce Sterling, Microsoft Chief Technology Officer Nathan Myhrvold, information scholar Margaret Hedstrom, technologist Brewster Kahle, documentarian Terry Sanders, and Long Now Foundation founder Stewart Brand—reshaped the cultural conversation to broaden digital preservation from an academic field to one understood as having wide-ranging implications. This would not just be the preservation of technical or corporate documents but rather the collective digital memory of our society. They thus laid the foundations to avert the digital dark age. Many of the academic discussions explored at length in chapter 1 received their popularization through these people. This chapter explores the process that built a social and cultural consensus about web archiving. The specific history of the Internet Archive, which owes itself in some ways to the unique milieu of Brewster Kahle, will be primarily explored in chapter 3.

This chapter is primarily focused on thought leaders rather than everyday people. While in part this reflects the record—the individuals discussed in this chapter left behind a rich documentary record in film, media, and academic proceedings—it also reflects the reality of digital preservation at this time. Regular users would have seen digital obsolescence as perhaps inevitable, or something that would be tackled on an individual level: to print off a document or save information on a new computer. Even as late as 1996, users still needed newspaper and magazine explainers to trace out the meaning of the 404 Not Found error. One article tried to make the case that the 404 error had entered the popular vernacular—“he’s 404, man” for “someone who’s clueless.”1 Overall, the ephemerality of web resources was something that was still being explained to many users at this time.2

There were other approaches not discussed at length in this chapter. Fan and community archives began to appear on the web in the late 1990s, such as Jason Scott’s aforementioned textfiles.com in 1998 or the new media arts organization Rhizome’s ArtBase in 1999 (discussed later in this chapter). These vividly demonstrated how some organizations and institutions were independently understanding the web as an archival medium in and of itself, absent state leadership or in many cases respect for copyright.3 But, ultimately, the coordinated, long-term perspective of the Internet Archive and other projects distinguishes the discussions in these chapters. It was people like Myhrvold and Brand who realized that the preservation of web-based digital material required a “systems approach.” Yet everyday users would be profoundly affected by these conversations. Their content would eventually be crawled by the Internet Archive and national libraries. They would feel the impact of the conversations explored here.

If in 1990, the term “digital preservation” did not formally exist and conversations were happening in niche technical venues, by 1997 the idea was so commonplace that a prominent documentary like Terry Sanders’s Into the Future could be made about the topic. The need for, and existence of, the field was assumed knowledge by information professionals, and numerous conferences had by then been held on the subject by a wide variety of stakeholders. Much of this now drew upon a utopian faith that not only would the historical record eventually be saved but that it would be a dramatically better one thanks to technology. The digital dark age would be averted. Through technology, the mists that occluded the historical record could perhaps be dispersed, leading to better histories. As part of this, the conversation dramatically expanded to include technologists, artists, and writers, who began to see potential alongside fear. In averting a digital dark age, could we instead rather see a golden age of memory?

Meeting the Challenge of Digital Preservation:
The Challenge of Networked Information

As memory institutions, research libraries and archives—both research and national institutions in affluent countries—had by the 1990s begun to build capacity to meet the challenge of digital preservation. Their earlier professional engagement with electronic records would be helpful. This was fortuitous as they would soon face the difficult problem of preserving the web.

The 1990s would mark a watershed moment for the field with the newly coined name of “digital preservation.”4 Terry Cook, an archivist and theorist then based at the National Archives of Canada, posited in early 1992 that archives were then on the cusp of the second generation of electronic records. As Cook explains, the first generation dated from the 1970s and 1980s, involving digital objects such as surveys, statistics, and censuses. While we saw their complexity in the previous chapter, these types of first-generation records were comparatively simple as they were “flat files” without many dependencies. “Each flat file, with sufficient documentation,” Cook noted, “could readily be reconstructed to ‘run.’ ”5

The second generation of records, appearing by the 1990s, would be even more difficult to preserve. These new files, as Cook put it, were “large hierarchical, networked, and especially relational databases [where] information is stored in many internal tables, entities or structures, that have meaning only inasmuch as they are related to each other.”6 The sources of information were also shifting from tabular data to “letters, memoranda, policy summaries, operational case files, crucial financial spreadsheets, vital interpretive graphic material, even maps, photographs and sound recordings . . . being converted into the digital bits that make up electronic records.”7 The groundwork for these complex records was laid well before the web, but these early experiences would help set the stage for the web’s subsequent preservation.

One of the first reflections on the unique digital preservation challenge ahead came with “Electronic Technologies and Preservation,” a report written by Yale University Library administrator Donald J. Waters. Presented at the 1992 Research Libraries Group annual meeting, the paper articulated a future shape of libraries and archives in the digital age. The growing amount of digital information presented growing access problems. “Information also is increasingly available electronically as a direct source of recorded knowledge,” Waters argued, noting the added challenge of “compound documents,” which included hypertext, “mixed text and image,” and multimedia.8 Complementing Waters, in September 1992 the final report of the Cornell/Xerox joint study on digital preservation was released. It foresaw digital technology as enhancing access, highlighting a demonstration project that showed how one could remotely access digital images over a network, echoing earlier utopian takes on networked information.9

As increasing attention was paid toward the merits of making information available on the internet and web, it was not long before digital preservationists began to worry about the web’s long-term sustainability. Yet there was still a gap of a few years between the early 1990s buzz around networked resources and the rise of fears around its ephemerality five or so years later. In part, this reflects the time needed for the web’s dominance to be clear, as it was still part of a broader ecosystem of internet access platforms including Gopher and WAIS.

Nineteen ninety-four was also the year the Commission on Preservation and Access teamed up with the Research Libraries Group to launch the Task Force on Archiving of Digital Information. The Task Force would produce a series of reports throughout 1995 and 1996, culminating in a well-received final report. The twenty-one-member Task Force included librarians, private-sector representatives, and publishers who collectively explored key problems in the field intending to ensure “continuing access to electronic digital records indefinitely into the future.”10 This broad membership was key. As Task Force member Hedstrom recalled “it tied research libraries and archives and digital preservation in a way that was quite holistic for its time.”11

The Task Force’s final report explored themes that would influence the digital preservation world. The report opened with an evocative image, presaging the rhetoric of a digital dark age: “Today we can only imagine the content of and audience reaction to the lost plays of Aeschylus. We do not know how Mozart sounded when performing his own music. We can have no direct experience of David Garrick on stage. Nor can we fully appreciate the power of Patrick Henry’s oratory. Will future generations be able to encounter a Mikhail Baryshnikov ballet, a Barbara Jordan speech, a Walter Cronkite newscast, or an Ella Fitzgerald scat on an Ellington tune?”12 The digital age was dawning, the report noted, with “virtually all printing and a rapidly increasing amount of writing” being done digitally. Yet this needed to be considered in a context of ever-increasing complexity. Digital information’s life might become, echoing Hobbes, “nasty, brutish and short.”13 While the Task Force recognized the long history of digital preservation, the web was an accelerant. Everything would soon be transformed.

How could a library or archive preserve a hyperlinked networked resource if one could not preserve all of the other resources that were hyperlinked from it? Was this content not an integral part of the document? Indeed, the Task Force noted the challenges brought on by an interconnected resource like the web. The report used a network metaphor to explore the question: “If the integrity of these objects is seen as residing in the network of linkages among them, rather than in the individual objects, or nodes, on the network, then the archival challenge would be to preserve both the objects and the linkages, a task that would today be exceedingly complex.” The “stop-gap measure would be to treat the network in terms of its component parts and to take periodic snapshots of the individual [web] objects.”14 This was an early preview of the defining challenge of web preservation: hyperlinks and the issue of completeness.

By 1997, then, thanks to these efforts, the importance of preservation was now understood by many in the library and archival field to be a defining challenge. Two highly cited and influential papers illustrate this. One of them was by an individual long involved in the electronic records field, bringing earlier expertise to bear on this new challenge. This was Hedstrom, by then an associate professor at the University of Michigan’s School of Information and a Task Force member, who minced few words in a 1997 article entitled “Digital Preservation: A Time Bomb for Digital Libraries.” Hedstrom noted that “new technologies for mass storage of digital information abound, yet the technologies and methods for long-term preservation of the vast and growing store of digital information lag far behind.”15 The intersection of “mass storage” and “long-term preservation” was, to Hedstrom, a ticking time bomb. As Hedstrom recalled, this was her most-cited publication, which, to her, was surprising given it was in the somewhat niche journal Computers and the Humanities: “That piece, I would say, was kind of putting a stake in the ground based on where things were at the time. And I have to say, in all honesty, for me personally, it was kind of like, OK, this is the problem and, you know, I don’t have a lot more to say . . . I don’t have a lot more to say about this.”16 The time bomb was a provocative metaphor. Act now, or irreparable damage would be done.

The second pivotal paper was “A Digital Dark Ages?” by Terry Kuny. Kuny raised the special challenge of web preservation. “Libraries which seek out materials on the Internet will quickly discover the complexity of maintaining the integrity of links and dealing with dynamic documents,” he noted.17 Kuny recalled this piece as a similar sort of summative, state-of-the-field article: “[It was me] speaking to the converted in some respects, but saying, ‘basically, this is a huge challenge. I don’t know that we’re up to it.’ [laughs] I don’t know that anybody is up to it, actually.”18 Both accounts were prescient about what lay ahead and helped set the agenda. A digital dark age was on the horizon, and society needed to act quickly.

Within a few years, research libraries would understand digital preservation as a core task. As the web rapidly grew, the conflation of intellectual activity and the web’s growth would lay the foundation for libraries to move into action. Would they be ready in time? For there were now growing fears around the idea of an irrecoverable cultural loss. Concerns mounted that as people moved onto the web, their information could quickly disappear. Would the web be a place where knowledge went to die?

The Dead Media Project

Lisa Gitelman notes that “all media were once new.”19 Just as we continue to call the web “new media,” the phonograph was just as cutting edge and new in its heyday. New media becomes old media, which in turn can become—as science fiction author Bruce Sterling evocatively put it in 1995—dead media. Sterling presented his “Dead Media Project” as a way to challenge the “newness” of new media. Sterling and his message would introduce ideas of digital preservation and obsolescence to a much broader audience than those reached in scholarly journals and professional task forces. The way that Sterling articulated the problem, digital preservation was not just an intellectual or academic concern. It was a problem for society.

Addressing the Sixth International Symposium on Electronic Art in Montreal, Sterling coined the phrase “dead media.” This was an evocative and influential framing. As Tara Brabazon argues, the “term captured lost, marginalized, and obsolete media. It was part archive, part nostalgia, part requiem.”20 Sterling’s speech, subsequently published as the “Life and Death of Media” manifesto, articulated the ephemerality of digital media. “Before we install the latest hot-off-the-disk-drive version of Windows for Civilization 2.0,” Sterling argued, “we ought to look around ourselves very seriously. Probably, before leaping into postmodern ecstasy into the black hole of virtuality, we ought to make and store some back-ups of the system first.”21 To do this would require a rethinking of society’s relationship with technology.

Sterling argued that society needed to move past an implicitly Whiggish narrative of “technological history.” In other words, the model of ever-increasing improvement toward an enlightened present inevitably left little room for technologies that were not part of the main narrative. Sterling contested the techno-utopian narrative of unfettered progress. He argued that media history was governed by a paradigm of progress: “all technological developments have marched in progressive lockstep, from height to height, to produce the current exalted media landscape.”22 What if somebody wrote a history of all the inventions that did not fit into this narrative of progress, to instead consider the new media that became dead media? Sterling proposed that somebody (not him: “someone else”) should write The Dead Media Handbook, “a field guide for the communications paleontologist.”23

Sterling made the fragility of digital information clear to his artistic audience. At one point in his speech, Sterling gestured at his laptop computer—“a Macintosh PowerBook 180.” He noted that it was an impressive machine but that ironically the “name PowerBook somehow suggests that this device can last as long as a book, though even the cheapest paperback will outlive this machine quite easily.”24 This was important. Sterling continued:

Suppose you compose an electronic artwork for an operating system that subsequently dies. It doesn’t matter how much creative effort you invested in that program. It does not matter how cleverly you wrote the code. The number of man-hours invested is of no relevance. Your artistic theories and your sense of conviction are profoundly beside the point. If you chose to include a political message, that message will never again reach a human ear. Your chance to influence the artists who come after you is reduced drastically, almost to nil. You are inside a dead operating system.25

In other words, to Sterling, you “have become dead media.” Something needed to be done.

The energy around his call to action led to the formation of the Dead Media Project. The eloquent and forceful nature of Sterling’s speech gave the dead media manifesto enduring life beyond Montreal. The following year, in 1996, Sterling, joined by fellow science fiction author Richard Kadrey, cofounded the Project. In their coauthored “Modest Proposal and a Public Appeal,” Sterling and Kadrey made the case that new media does die. As they explained, everybody knew of newspapers, TV, video, cable, but perhaps not the “Edison wax cylinder,” “The Pandorama,” or the “teleharmonium.” Just as businesses no longer used pneumatic tubes to send information, Sterling mused, “How long will it be before the much-touted World Wide Web interface is itself a dead medium? And what will become of all those billions of thoughts, words, images and expressions poured onto the Internet? Won’t they vanish just like the vile lacquered smoke from a burning pile of junked Victrolas?”26 Through these statements, which articulated what was at stake, Brabazon argued that Sterling “granted the internet a history and ensured that it was part of a wider analysis of media, communication and identity.”27

The intellectual ferment and energy behind the project’s conception would ultimately be more important than its execution. The project’s listserv grew to have around 600 active individuals by 1999, a place where members could submit examples of dead media, which Sterling would subsequently edit and distribute.28 Yet, in an ironic twist, the project itself languished and began to degrade through neglect. While it has been partially restored today, Brabazon’s note that “if there is anything sadder than dead media, then it is dead links from a Web site on dead media” rings true.29 The community had fallen victim to digital obsolescence.

Yet the project provided crucial historical context and awareness of the web’s ephemerality. That the Dead Media Project itself became obsolete does not occlude its intellectual contribution. Sterling made it clear that, given the ephemerality of so much new media, he did not “expect the Web to last very long indeed, at least not in its present form.” After all, even by 1999 as he explained in an interview with a new media journal, “there are large numbers of abandoned websites on the Web that were partially constructed and then left to rot in cyberspace. And have you tried using ‘gopher’ or ‘WAIS’ lately?”30 Under Sterling’s model, preservation was articulated as less a default outcome and more an exception. It was another blow for the Whiggish vision of progress. What would this, however, mean more broadly for our historical record? Would we be on the verge of a digital dark age? Sterling suggested that new platforms—before they could reach a critical mass and become ubiquitous—were especially risky: it was still unclear what direction the web itself would take. What if everybody built a vibrant culture on the web, only to lose it all? Would we be witness to a mass erasure of history?

The Specter of a Digital Dark Age

Enter the idea of a digital dark age. The concept was best articulated in a January 1995 Scientific American article by RAND Corporation researcher Jeff Rothenberg. Rothenberg explored the difficulties in preserving digital data. What if, in fifty years, his grandchildren found a CD-ROM? Could they read the physical medium, and even if they could, what about the file formats within? This was not a new challenge. Rothenberg nodded toward the apocryphal fears of the 1960 American census. But as everyday people began to move to the “digital,” he argued that this was going to become an increasingly pressing issue.31 Writing for a popular audience, Rothenberg evoked a mental image that would dominate the popular understanding of digital preservation for the coming decades.

Echoing Rothenberg in his 1997 presentation to the International Federation of Library Associations (IFLA) conference, Terry Kuny stressed that “being digital means being ephemeral . . . it will likely fall to librarians and archivists, the monastic orders of the future, to ensure that something of the heady days of our ‘digital revolution’ remains for future generations.”32 Stewart Brand, the technologist behind the late 1960s Whole Earth Catalog and later cofounder of the Long Now Foundation, was also raising concerns by the late 1990s that records were being quickly lost. “We can read the technical correspondence from Galileo,” Brand argued in 1998, “but we have no way of finding the technical correspondence [of the digital era].”33

The use of a historical argument made the digital dark age framing so effective. It was often informed by a personal experience of the digital historical record slipping away. Reflecting almost twenty-five years after his “digital dark age” paper was presented at the (IFLA conference, Kuny noted to me that his ideas came out of personal experience: “I realized that, as I started getting a little bit older and moving into the library community, even my own personal digital footprint was disappearing. And it wasn’t even accessible to me. I was having my own personal ‘digital dark age’ all the way through, and it continues. I see it happening all the time. I’ve had a big digital life, but I haven’t been able to maintain the record of my own digital life.”34 Early technology adopters, who would see their own private and professional records face obsolescence before the wide adoption of personal computing, would be key to helping motivate early concerns. The limited spread of personal computing meant that while there was not a popular groundswell of stories about digital loss, early adopters set the stage.

Those working in the archives and records field had, of course, worried about the digital dark age before it became a media trope. Asked about the term, Edward Higgs, a historian who worked at the United Kingdom’s Public Record Office in the early 1990s, and who would help drive early scholarly work in this field, joked, “I think I invented the term actually at some point! I mean, it’s such an obvious thing to say. So probably hundreds of people were using it . . . We were very concerned.”35 Hedstrom recalled that the term itself “came out of kind of the records management world,” reinforcing Higgs’s tongue-in-cheek origin story.36 It was clear that this idea was increasingly widespread by the early 1990s. Others, however, questioned the “sense of panic” inherent in framing digital preservation as a dark age. Paul Koerbin, who began working on the Australian web archive in 1996, recalled that it was not “quite as dramatic as . . . a ‘digital dark age.’ It was just all this material, [these] publishing formats that we’re not collecting.” Perhaps naïvely, as he recalled, Koerbin figured that “you know, once we get [these projects] up and running, we can deal with this.”37 Koerbin recalls that it was viewed as an opportunity: “I heard a lot more about visions of being able to ‘time travel’ through the past web rather than falling into a black hole,” portending the prospect of a golden age of memory.38

We have seen these kinds of source gaps before, even if they lacked the evocative framing of a dark age. For example, television archives continue to be mostly inaccessible for historians. Television broadcasts are “even more ephemeral than the Internet.”39 Not subject to legal deposit, and mostly produced under a copyright regime that required a station to keep just one copy, television has remained mostly off limits and historians’ understandings of the postwar world have suffered as a result. But, perhaps because of the power of digital storage and the democratic prospects of digital media, it was digital media—and the web in particular—that ignited fears of a digital dark age. Television was a broadcast medium—the internet and the web were even more complicated because of the publishing dimension. Due in part to the techno-utopian ferment of the time, many commentators assumed that this problem would eventually be solved. Indeed, to most, it seemed like an article of faith. But in the meantime, commentators worried about how long this would take. How long would the “gap” between the adoption of digital media and a long-term preservation solution be?

The potential length of this gap varied. Danny Hillis, who, as we will see, was instrumental in the digital preservation field more generally, noted that “from previous ages we have good raw data written on clay, on stone, on parchment and paper, but from the 1950s to the present recorded information increasingly disappears into a digital gap. Historians will consider this a dark age.”40 As Brand noted as late as 1999, “with digital media it is increasingly possible to store absolutely everything. The traditional role of the librarian and curator—to select what is to be preserved and ruthlessly weed everything else—suddenly is obsolete.”41

The digital dark age was not just the issue of obsolete disks no longer fitting into a disk drive, or hard drive faults. Those are problems, of course, but solvable ones. As we have seen, the most significant issues are policy and institutions. Hedstrom mused on this point, noting that the “field in general got kind of hung up on some of the wrong things . . . in particular, [it] got hung up on technology, obsolescence and formats.”42 To her, the digital dark age came from the decision for institutions to fix content and keep it offline (and thus inaccessible). Kahle, who would later found the Internet Archive, waxed eloquently at length on this point in conversation with me. The specter of a digital dark age haunted him “every day” of his life, but what he found most interesting was how others understood it:

It’s been interesting to see other people’s ideas of what the threat vectors are [of a digital dark age]. You know, how is it going to happen? And it’s changed over time . . . It’s interesting to see what other people’s, you know, what part of the elephant of the preservation problem they see. And whether it’s the hard drives . . . or is it going to be institutional instability? . . . The biggest problem I see is corporations, the rise of corporations, which is just this viral disease that has really hit the world really since World War Two.43

As Kahle explained, “I thought this was a technical problem. This isn’t a technical problem.” The problem was that corporations don’t have long-term perspectives. Copyright further compounded this. Hedstrom recalled this of the Task Force on Archiving of Digital Information: “And so the important point, which I think came from, as I recall, kind of came from Don Waters, was that the intellectual property owners are kind of the first line of defense against losing stuff. Right? And if they can’t take care of stuff on their own, then they’ve got to negotiate somehow on their intellectual property rights.”44 As we will see, the Internet Archive’s foundational structure and early activities grew out of both realizations: that for-profit enterprises and copyright lie at the heart of these challenges.

Both Kahle and Hedstrom were correct when they emphasized that the political and economic challenges of digital preservation would be more vexing than technological ones. Indeed, the rate of file format change has slowed. This is thanks in part due to the increased file sharing made possible by the web, which may have led to format consolidation. As Hedstrom noted to me, “the Web was a real boon to just being able to keep things going and accessible and moveable . . . when you now need to be able to exchange things in real time, it becomes much, much easier to exchange things over time.”45 This was hindsight, of course. At the time, it all seemed overwhelmingly challenging. Would our human record go the way of the dinosaurs? At Microsoft headquarters in Seattle, one executive was making that connection.

“Save the Web”: Nathan Myhrvold and the Mainstreaming
of Web Archiving

A dinosaur brain set Nathan Myhrvold down his path to web preservation. Looking at a “plaster cast of the tiny brain pan of a Tyrannosaurus rex,” Myhrvold recalled, “reminded him how few fossil records the dinosaurs had left behind.” From there, Myhrvold began to think about the records that we leave behind—and how, in 1996, that would inevitably involve the records humans were posting on the web. “And in a conceptual leap worthy of Mr. Myhrvold’s training as a physicist,” wrote Denise Caruso in a New York Times profile, “this thought set him to worry about the Internet and the World Wide Web.”46

Myhrvold, Microsoft’s chief technical officer between 1996 and 1999, is fascinating for both his sudden arrival and then departure on the web preservation scene. Becoming a leading figure in 1996 amongst web preservationists thanks to a widely circulated memo, Myhrvold gave a publicized plenary address on the topic in 1997, before almost as quickly turning to his many other endeavors and leaving the preservation conversation.

Myhrvold brought an interesting background and perspective to bear on the field’s problems. By the age of 24, Myhrvold had already earned a doctorate in math and completed a year of a postdoctoral fellowship with Stephen Hawking at Cambridge University, before pivoting to found a Silicon Valley technology company in 1984. It was subsequently acquired in 1986 by Microsoft. He had then become Microsoft’s director of special projects, and eventually chief technology officer. Beyond his everyday supervision of Microsoft’s software development portfolio, Myhrvold became “[Bill] Gates’s strategic planner and futurist.”47 Internally within Microsoft, he became known as the “Insider as Outsider,” releasing “several times a month . . . lengthy memorandums (which can run to nearly a hundred single-spaced pages) that question what Microsoft is or should be doing.”48

While many of these memos are now inaccessible, victims of email’s ephemerality and corporate privacy, those that remain demonstrate an expansive scope. Consider Myhrvold’s most famous missive, “Road Kill on the Information Highway,” a 20,000-word rumination on the internet’s impact. Written in September 1993, the memo grasped the social and cultural impact of networked communication.49 It captures Microsoft’s early grappling with the subject.50 Crucially, it portended future directions for Myhrvold when it came time for him to consider preserving society’s record.

Myhrvold had been reflecting since 1993 on the role that widespread storage of digital information would have on everyday people. The problem of digital preservation was framed in solutionist terms, a solvable problem that in its resolution could go one step further and herald a golden age of memory. “Given the increase in storage on PCs, why not record every version of every file?” Myhrvold rhetorically asked in this memo, “high speed networks and new software will make this quite cheap.” Universal storage could be akin to an airliner’s black box, applied instead to a wider array of social contexts. Some of this was chilling, as increased storage could, for example, keep archives of surveillance cameras. Myhrvold grappled at length with the downsides (all people have told “a lie or done something that in retrospect they aren’t proud of”) but noted that “whether putting your life on line is good or bad, it is very clear that it will be both feasible and quite cheap. Given this I believe that it will be widely used in at least some circumstances.”51 For a memo written in 1993, it was prescient in how it grappled with the societal implications of storage . . . and the downside of having an ever-present historical record.

Beyond the broad implications of cheap storage, Myhrvold saw computing as central to information distribution. In this, he drew parallels with the printing press that drew on his robust understanding of historiography. “It is estimated that Europe had on the order of ten thousand books just prior to Johan’s invention—within fifty years it would have over eight million . . . I believe we are on the brink of a revolution of similar magnitude. This will be driven by two technologies—computing and digital networking.”52 These historical connections were in keeping with his broader historical worldview. Myhrvold was drawing thoughtful comparisons between new information systems, the industrial revolution, and connected historical arguments with contemporary developments.53

These currents came together for Myhrvold when he saw the dinosaur brain. The brain spurred thinking about the fossils that our own society would leave behind (it might seem to be a stretch, but as the father of a one-time dinosaur-obsessed child, I can say that reading about dinosaurs does spur thinking of a much vaster time scale!). In March 1996, Myhrvold explained to the New York Times why he was so worried about losing web content:

“The Web is losing its history,” Mr. Myhrvold said. But with so much that seems irrelevant published on the Web today, what does it matter? Who needs to chronicle the human achievement of a Web site that is connected by live video feed to a toilet?

Not the point, according to Mr. Myrhvold. Over the last two decades, an historic shift has occurred as an enormous amount of human endeavor—culture, commerce, communication—has moved from the physical world into the realm of electrons.54

Myhrvold grasped the web’s evolutionary potential for historical research, distinguishing his approach from that of the digital preservation community (which was then somewhat focused on university or corporate records). He considered the forthcoming impact on ordinary people. “Every day the Web becomes more and more important in academics, business and ultimately contemporary culture itself . . . Sure, we were all writing, but if we don’t save it, it isn’t part of the historical record,” Myhrvold explained. Indeed, he wondered if people writing about the web in books and magazines might end up being better preserved than the primary documents themselves.55 As I have seen in my own research into the 1990s web, he was right.56

Myhrvold was thus among the first to correctly identify that web preservation was not key to preserving the record of technical decisions and internet culture itself, but all culture as reflected in these new digital media. Perhaps because he occupied a front row seat on network debates at Microsoft, he articulated that the web was the new printing press. Drastic action would be needed for its long-term preservation. Our collective historical record was threatened. It was not just preserving internet history but preserving history on the internet. Growing out of this, by mid-1996, Myhrvold was appearing in the media as somebody, as one magazine article put it, who “lately has been championing the idea of an archive” of the web.57

This was bolstered by the widely distributed “Save the Web” memo that Myhrvold wrote. In the New York Times, John Markoff argued that the “rallying cry to archive the Web began last year when Nathan Myhrvold, the chief technology officer at Microsoft, sent an electronic ‘Save the Web!’ message to a group of colleagues. ‘The Internet isn’t naturally archival,’ he said. ‘The Net isn’t going to archive itself.’ ”58 Similarly, Slate’s Bill Barnes made the same connection, arguing that Myhrvold’s “Save the Web memo last year helped start the archive movement.”59

By spring 1996, Myhrvold’s concerns around the disappearing web brought him into conversation with a historian, Philip L. Cantelon, president of the historical consulting firm History Associates Incorporated. Cantelon brought Myhrvold as well as one of the internet’s founding figures, Vint Cerf (who, in 1973, had coauthored the foundational TCP/IP protocol that underpins the internet), together to discuss convening a conference to deal with the “danger of losing the documents and information necessary to write the history of our times.”60 They planned to hold the event in late 1997 but “all quickly agreed that the urgency of the problem required more expeditious action and the conference was scheduled for February 1997.”61 Invitations would be sent out, and in February, Myhrvold and Cerf cochaired the “Documenting the Digital Age” conference. This would be a major gathering of people to discuss the next steps for action.

Documenting the Digital Age: Historians and the Turning Point
of Web Preservation

The Documenting the Digital Age conference was a significant turning point in web preservation, bringing together historians, technologists, librarians, and archivists from across the private and public sectors. Conference cochairs Cerf and Myhrvold decided that they wanted to move the discussion “beyond the usual professional boundaries,” drawing instead on experts from “private and public sectors, specialists in archives, communications, digital technology, history, and the law.”62 A big problem like preserving the web as the future historical record of society would need diverse voices, from the entrepreneur to the archivist to the copyright specialist. Yet there are ironies to studying the conference. Despite its emphasis on ensuring web preservation and developing action plans, the conference materials themselves were not preserved.63 The conference website, unevenly preserved by the Internet Archive, fell victim to the dreaded 404 after only a few years.

Held between 10 and 12 February 1997 in San Francisco, Documenting the Digital Age was organized by History Associates, with support from the National Science Foundation ( NSF), the telecommunications company MCI, and Microsoft. This sponsorship was in itself significant. MCI sponsored it primarily thanks to Vint Cerf, the TCP/IP codeveloper and then senior vice president of technology strategy at MCI. Myhrvold (presumably) brought Microsoft to the table. Donald J. Waters, then the director of the Digital Library Federation, noted that such sponsorship was rare: big business did not typically sponsor such events. As Hedstrom noted to me, this sponsorship was key for the small field. “We were desperate at that time to try to get industry interested in this, and we were talking about things like: ‘could we get Microsoft to have, besides having a save button, to also have an archive button?’ ” Hedstrom recalled, raising the idea of a button that would comprehensively extract a website’s source code and send it elsewhere for preservation.64

Waters specifically noted Myhrvold’s presence, explaining that he “provided its keynote theme in a widely circulated memorandum in which he asked: ‘who will save the Net?’ ”65 The wide array of attendees was notable, with Slate’s Bill Barnes noting that the event gathered “experts from the computing, telecommunication, and archiving worlds to explore these issues.”66 User voices—notably historians—were relatively sparse, a point to which I shortly return.

The conference has gone largely unremarked upon in the literature, perhaps because of its ephemeral digital footprint. The historian Roy Rosenzweig observed that while Documenting the Digital Age was an important event, as a “partial exception” to the trend of historical nonengagement with archives, in that it involved several historians “but only one university-based historian”—the website had “disappeared from the web, [nor] is it available in the Internet Archive.”67 The conference website was not preserved. However, thanks to the relatively recent keyword search functionalities in the Internet Archive’s Wayback Machine, I was able to find the postconference website at the Internet Archive. This site provided the basics of talks and structure. Hedstrom also provided me with an extensive array of documents from the conference, ranging from position papers to the (NSF final report to correspondence surrounding the event.

Documenting the Digital Age covered many topics, from Myhrvold speaking on the overarching question of “Why Archive the Internet?” to specific points by archivists around what should be preserved, how to preserve it (Brewster Kahle and Donald J. Waters spoke specifically on this question), legal issues, and questions of search. The discussion culminated in the big question: “What do we do next? Who will take responsibility? Who will provide funding?”68

Reflecting on the conference, Hedstrom said it was rewarding. She specifically recalled Kahle’s energy and enthusiasm, as he had just begun collecting with his Internet Archive. Kahle had also begun to worry about takedown notices and other obstacles being thrown in the way of the Internet Archive:

Brewster’s kind of thing was: “Well, let them come after me. I don’t want to be in a position where I have to get prior clearance from anybody who might claim copyright in this stuff and before I capture it” . . . And the discussion was about that kind of thing. Like what would happen if somebody told you that they want their stuff eliminated and wiped out?

And it was more giving Brewster a little bit of: “OK, you have to act a little bit more like a grown up, but you don’t have to cave in completely.”69

For the young field of web archiving, it was optimism—let’s save as much as we can—tempered by realism and pragmatics. Such a conversation served as a useful bridge between long-serving practitioners and newly arrived technologists.

The precirculated papers and presentations covered a lot of ground. Kahle stressed that while the “documents on the Internet are easy documents to collect and archive,” haste was needed because of the short life span of documents. Otherwise, the web would be too unreliable to cite.70 Hedstrom imagined a hypothetical researcher exploring Gulf War Syndrome. As part of this thought experiment, the user discovers that there was an online discussion group. Some of the data has been archived only on magnetic tape, requiring a specialized workstation. Ultimately, after a series of obstacles, the imagined researcher abandons the project.71 Only through a series of forward-thinking interventions could this state of affairs be averted. Waters presented on the experiences of the Task Force on Archiving of Digital Information, echoing the Hobbesian line that the life of digital information would be “nasty, brutish, and short.”72

Myhrvold’s plenary explored how the “Internet is rapidly becoming a key method for communication and the dissemination of documents and ideas,” including email, webpages, bulletin boards, chat services, and the rise of indexing services to find all this information. This represented a revolutionary shift in publishing: “All of these aspects of the Internet are remarkably cheap, both in the absolute, and in comparison with other media.” Myhrvold’s central argument underscored his intervention:

These properties make the Internet a tremendous information resource. Technological trends suggest that the Internet will get a variety of new capabilities over time, such as the ability to easily deal with high quality video. The Internet is about all you could ask of an information resource.

Except one thing: the Internet is not naturally archival.73

There was tension around Myhrvold’s utopian vision of trying to save everything. In focusing on conceptual issues, he risked overlooking the very real technical challenges facing the field.

Indeed, much of the final report prepared for the National Science Foundation goes into detail around the extensive debates that came out of Myhrvold’s evocative call to “save it all.” Hedstrom recalled these debates in our interview. To her, Myhrvold “was sort of standing up talking about how you could save everything. And there’s no reason to think about what you save and don’t save. And I mean, those of us in the archiving world thought it was pretty, pretty naïve.”74 Myhrvold’s argument was: “I believe that it is incredibly dangerous to second guess future generations, and edit the historical record. We should archive all of the net that we possibly can. Ironically, it is probably cheaper and easier to store it all. Digital tape is cheap. Human time to categorize and edit is expensive by comparison. Leave the editing and selection for future generations—or their software agents.”75

These arguments, as we will see in chapter 3, would be enacted with some success by the Swedish national library. The Internet Archive, too, would eventually try to save it all—while this is an unachievable goal, it was something to aspire to. Debate at the conference on this point was extensive, reflecting the lack of consensus around the right approach to take. To some attendees, mass collecting would be postponing the inevitable process of selection (at the very least, it might need to be done by future researchers). Other participants feared that too much information being collected might obscure the data’s context, while others wondered if mass collecting would be the “best use of scarce funds.” Conference attendees were aware of the complexity facing them.76

This debate suggested a growing divide between a technologist approach to preservation—collect it all and sort it out later—and the traditional professional approach to preservation with an emphasis on curation, context, and descriptive metadata. Myhrvold sketched out a vision of future research processes that in some ways presaged the utopian spirit of the digital humanities a decade or two later. In this hypothetical future, researchers would use technology to navigate information, rather than reading pages one by one or consulting finding aids. Researchers would leverage emerging information retrieval technology. “Want to find out who started a particular idea, rumor or trend?” Myhrvold rhetorically asked, imagining a researcher who could find the first occurrence before moving on to related instances. Or, alternatively, to compare presidential elections in 1996 and 2000 by running “cross comparisons by searching and cataloguing sites . . . [t]raditional historical analysis will be possible, but so will many other new methodologies that are enabled by the information retrieval software.”77 Myhrvold was arguably correct about the falling price of storage but underestimated the challenges of processing and making usable terabytes of raw data. We will return to these debates in the following two chapters, as the Internet Archive and Sweden adopted a “collect it all and sort it out later” strategy, whereas Australia, Canada, and the United States emphasized curated and described collections that could be more immediately useful.

In sum, the conference was a comprehensive event that succinctly explained the state of web archiving in 1997 and considered potential future directions. The major players in the field had gathered for this initial conversation, scoping out the problem for the next generation. The attendee list skewed toward archivists, media (John Markoff from the New York Times and Bill Barnes from Slate), and libraries. There was one notable gap in the attendee list: few historians.78 Few historians were there, save MCI’s corporate historian (Adam Gruen, a historian of science), James B. Gardner (a consultant with History Associates Incorporated), and an opening keynote address by Rutgers University historian James Muldoon on the communications revolution.

Barnes summarized the event in Slate. The corporate sponsors and attendees provided a unique flavor. “Corporate executives complained that because their archives are routinely subpoenaed by plaintiffs’ attorneys, they have every incentive to shred their data instead of preserving them,” noted Barnes, adding that lawyers also “worried aloud about privacy and copyright concerns.” Attendees also discussed the ethical implications behind web archives, beginning a vein of discussion that endures today: “Should you have the right to exclude your public page from the archive? (Consensus opinion: Yes.) Should we be saving usage logs, which detail every page a person sees? (Probably not.) Doesn’t this whole thing violate current copyright laws left and right? (Almost certainly.) Should those laws be amended to allow such an archive? (Probably.)”79 At Documenting the Digital Age, discussions around the importance of preserving the web were percolating amongst a growing body of people. Reading the proceedings today, I was struck by the degree to which these conversations held almost a quarter of a century ago echoed the ones being discussed at today’s web archiving conferences.

After the event, there were attempts at organizing follow-up activities to build community and articulate next steps. A shorter (and smaller) follow-up meeting was held in May 1997 in San Francisco, sponsored by the Council on Library and Information Resources (CLIR), with an attempt to create an “agenda for further action.”80 Goals included assigning direct responsibility and continuing these conversations by bringing discussion points back to professional groups, whether governmental, academic, or private. The intention was to keep the momentum going by ensuring that the conversation continued in diverse settings.

Unfortunately, Documenting the Digital Age was not the catalyst the organizers hoped it would be. The lack of the event’s long-term impact can be seen in just how difficult it was to learn about the conference despite the high-profile attendees and supporters. Perhaps a combination of Kahle’s Internet Archive moving forward, as well as national libraries, meant that the immediate action items were handled? Yet the conversations were important, covering significant topics. The event also helped bridge the generational gap, bringing long-time practitioners into conversation with a new generation. Indeed, a documentary video would soon help crystallize and summarize many of these issues for a broader audience.

Into the Future: The Conversation Goes Broadcast

By 1997, the major currents of the digital preservation problem were increasingly well known and widespread across the library and archives field. Awareness would be bolstered in 1997 with the release of Into the Future: On the Preservation of Knowledge in the Electronic Age, an hour-long documentary directed by documentarian Terry Sanders. It aired on PBS and was also sold and circulated on videotape. Drawing on interviews with digital preservationists (Yale’s Donald J. Waters, RAND Corporation’s Jeff Rothenberg), scholars (MIT’s Sherry Turkle and Michigan’s Hedstrom), and other web luminaries (Tim Berners-Lee and MIT Media Lab’s Michael Hawley), the video aired on PBS in January 1998 following earlier commercial availability in September 1997.81

Into the Future is an engaging watch. The documentary opened by touching on the problems of information overload, format obsolescence, electronic books, and literature. It concluded with an in-depth exploration of the problems facing those who sought to preserve the web. The narrator set the stage: “The sheer quantity of digitized information, and the dynamics of an evolving computerized world, create complex problems. One of the most serious is that we pay little attention to preserving electronic writings for the long term, to making sure that important and irreplaceable work will be saved and be available not just for our own use, but for generations to follow. What’s increasingly at risk is survival into the future of recorded knowledge, the survival of collective memory, the core of civilization, the human record.”82 The film in part revolves around the hubris of digital creators. As Rothenberg explains in the film, computer scientists tend to “charge ahead into the future” without paying heed to “old, obsolete systems.”83 The film interviewed most of the important people involved and discussed the major issues in the field. Notable missing individuals included Kahle and Myhrvold.

The documentary underscored the particularly vexing preservation problem of the web. Michael Hawley, then an assistant professor at the MIT Media Lab, explained that his team had tried to launch a web archiving project. “We thought it might be possible over a ten to twenty-day stretch to capture the entire content of the Web and put it in a little time capsule for future generations,” Hawley explained, noting that they were ultimately stymied. The “growth rates of data on the Web” meant that Hawley’s team was “no longer able to do that . . . the net [has] now grown past our ability to suck it back in.”84

Similarly, Rothenberg noted the problems around how to decide just what to select, presaging a problem that would vex archivists over the coming decades. If a document linked out to sixteen other sources, for example, would all sixteen of those need to be preserved? And what about the ones that those in turn link to? “You can think of the web as one huge interlinked connection of documents, you could think of it as a single document if you wanted to,” Rothenberg explained, “it’s dynamic, it’s changing every moment, people are adding things to it, modifying things to it.” Peter Lyman, university librarian at the University of California, Berkeley, added to this, noting that the underlying fundamental issues with creating a digital library were various. As Lyman asked, how best could web archiving happen absent government or even a centralized funding source? How could one develop “something more structured, something that thinks long-term about issues such as preservation, access, quality of information, quality of access?” As would be expected in 1997, the questions were many and the answers few.

Into the Future was well received and widely reviewed. “How fast do archivists have to run to stay in the same place?” wrote Paul Wallich in a review of the film for Scientific American, highlighting the challenges raised by web archiving and concluding that “where the Web was once a map for finding useful information in the ‘real world,’ it is now a territory where that information, ever changing, resides.”85 The American Library Association recommended the film “for all librarians and informed lay-persons.”86 In the Information Management Journal, Juanita Skillman reviewed the film as “a strong wake-up call we all need.”87 Crucially, writing in the American Historical Association’s professional magazine Perspectives on History, Pillarisetti Sudhir gave an in-depth laudatory review that explored how the film contributed “a feeling of unease with the present fascination with electronic recording.”88 The wide range of voices who reviewed the film suggested that Into the Future made a significant intervention in how many of these communities viewed the web.

What was perhaps the most telling about Into the Future was its reception by professional librarians and archivists. It is often difficult to understand the degree to which something is widespread knowledge across a profession. By 1997, it was clear that digital preservation was on the radar of many information professionals. “While those of us within the library and information professions may well learn from this presentation,” noted John Budd in a library journal review of the documentary, “it may be most effective to demonstrate clearly to college and university administrators, library boards, and others in decision-making positions the need for clear thinking with regard to information technology.”89 Sherelyn Ogden echoed this in Library Quarterly, as she noted that, while “the points made in the film will be familiar to most librarians and archivists, they are made very well and bear repeating.”90 This was a significant shift worth underscoring. If in 1990 the term “digital preservation” did not formally exist to cohere the range of activities that were beginning to take shape in that field, followed by the Task Force on Archiving of Digital Information (1994–1996), which introduced many to this new world, by 1997 digital preservation was sufficiently commonplace that not only could a PBS documentary be made about it but that it would be seen as more or less assumed professional knowledge in its reception. This was a rapid development. The field had come a long way.

Into the Future also spurred further conversation. Writing the next year in 1998, Margaret MacLean and Ben H. Davis reviewed the film: “Even handled as it is in a low-key fashion, it is a sobering experience to witness prophets such as these acknowledging the enormity and seriousness of the problem—the lack of agreement, tools, or standards for ensuring the survival of cultural heritage in digital form.”91 But what to do? A gathering held that year would try to bring all these disparate strands together.

“This Is No Way to Run a Civilization”:
The Conversation Comes Together at Time & Bits

As 1998 dawned, discrete threads and conversations were happening across the nascent world of digital preservation. Librarians and archivists had wrapped up the Task Force on Archiving of Digital Information, Kahle had started the Internet Archive, and technologists such as Myhrvold and Sterling had brought the issue to their respective communities. Another community would come together with these groups in a large conversation: thinkers concerned with the long-term future of humanity. These new conversations would combine the tangible questions of electronic artists with the long-term philosophical thinking of the “Long Now.” This would happen at the Time & Bits conference, held at the Getty Art Institute in February 1998. This would be one of the last wide-ranging and high-profile gatherings on the topic for years to come.

It was appropriate that the conference was held at the Getty. Artists had been early web pioneers, drawing on the affordances of hypertext and new media to create rich online art and exhibitions. Despite web art’s vulnerability to digital loss, however, preservation concerns were largely absent from much of the early commentary on net art—perhaps a result of it being such a new medium.92 Yet by 1998, it was a growing concern as seen in the Getty’s decision to host this event. A year later in 1999, the New York City arts organization Rhizome would establish “ArtBase,” an expansive online archive for new media art.93 ArtBase would later emerge as a significant player in the digital preservation space. Early pioneering web-based art was at risk of disappearance.

“Time and Bits: Managing Digital Continuity” brought together an eclectic group of individuals. Some will be familiar names: Kahle, Sterling, Lyman. Others were new to the conversation: Stewart Brand, founder of the Whole Earth Catalog, who was then raising awareness around the digital dark age; the musician and innovator Brian Eno; Wired editor Kevin Kelly; virtual reality pioneer Jaron Lanier; journalist John Heilemann from the New Yorker; Broderbund Software CEO Doug Carlston; futurist Paul Saffo; and digital archiving specialist Howard Besser.94 Apart from a public session at the end of the two-and-half-day event, the participants met in a closed session. Their conversations built upon earlier private online discussions held before the meeting.

The event was sponsored by the Long Now Foundation, a group with complementary aims to those of the digital preservation community. The Foundation grew out of a project that had been bubbling around Danny Hillis’s head—the parallel computing guru who, as we will see in chapter 3, first hired Kahle out of MIT—since the mid-1980s: the Clock of the Long Now, or the 10,000-year clock. Hillis’s idea was to “build a clock that ticks once a year. The century hand advances once every 100 years, and the cuckoo comes out on the millennium.”95 To build such a clock would require long-term thinking. The clock idea would bring people together to examine the different aspects of such a project: Stewart Brand thought about the organization that would sustain it (which would later become the Long Now Foundation), and Eno coined the name itself. A 10,000-year clock would be both a social and technical challenge, as Hillis noted: “Ten thousand years—the life span I hope for the clock—is about as long as the history of human technology. We have fragments of pots that old. Geologically, it’s a blink of an eye. When you start thinking about building something that lasts that long, the real problem is not decay and corrosion, or even the power source. The real problem is people. If something becomes unimportant to people, it gets scrapped for parts; if it becomes important, it turns into a symbol and must eventually be destroyed.”96 The Long Now Foundation was established in 1996 to develop two projects: the 10,000-year clock as well as its “Library” project, which would develop two main tools. First, a “Rosetta Disk” which, rather than being an optical disk, would contain over 13,000 pages in 1,500 languages “microscopically etched and then electroformed in solid nickel.” Pages could then be read through a “microscope at 650X as clearly as you would from print in a book.”97 Second, the “Long Server” project, which was the “the over-arching program for Long Now’s digital continuity software projects.”98

The larger philosophy behind Long Now centered on the idea of civilization as having stretched back some 10,000 years. Inspired by ideas of extending the idea of “now”—the concept of which might refer to timescales as various as this exact moment to the exact week one is living in—to a 200-year time horizon. As digital humanist and librarian Bethany Nowviskie has evocatively explained, the Long Now expounds a “puckishly provocative optimism in everything they do,” as opposed to complementary yet more pessimistic projects, like the Dark Mountain Project, that look toward the end of the world.99 Indeed, Brand’s account of the digital dark age was among the first to best grasp the utopian prospect of big historical data. “If raw data can be kept accessible as well as stored,” Brand enthusiastically explained in 1999, “history will become a different discipline, closer to a science, because it can use marketers’ data-mining techniques to detect patterns hidden in the data.”100 This was the subversion of the dark age concept. What if a golden age of memory was instead upon us?

This kind of big conceptual thinking was characteristic of Time & Bits. The hole idea, as articulated by the conference organizers, was to “do some ‘out of the academy’ thinking.”101 Participants started by sharing problem statements, extensively discussing them, watching Into the Future, and then assembling for a closing panel discussion. The most pivotal background paper, prepared by Lyman and Besser, outlined the problem that society and their small gathering alike faced: “our digital cultural heritage is disappearing, almost as fast as it is recorded.”102 Their essay succinctly summarized where things were in the field, notably in terms of networked information (must you follow hyperlinks to preserve a document?), strategies for preservation including how selective a collector should be, as well as how lessons from earlier format issues such as decaying acid paper could apply to this new problem. The discussions were recorded by organizers and subsequently published by the Getty Institute. Brand also used the event as a basis of a chapter in his 1999 The Clock of the Long Now.

The first task of the event, of course, was to understand the problem’s scope. What was data? “Anything that can be copied,” argued Wired editor Kelly. They then quickly, as a group, declared that, if possible, information should be saved in its entirety. As MacLean wrote: “The group agreed that it would be preposterous to propose any kind of selection criteria on what information should be saved. Many good reasons were cited, including the view that no one has enough wisdom to know which data might be valuable from another perspective in the future, particularly when you consider the analysis of large amounts of seemingly useless data which might have important information en masse.”103 A basic goal was thus established: try to preserve everything that can be copied. Brand would echo this, noting that one had to try to preserve everything as “you never know what will be treasured later.”104 Here the “puckishly provocative optimism” of the Long Now (per Nowviskie) was apparent.105 To make this point, Hillis brought a small replica of the Rosetta Stone to show three things. These were the “impossibility of predicting future importance,” how losing something made it last over the long term (the discovery of the Rosetta Stone ironically made its long-term preservation less likely, he argued), and when it was discovered, it was “immediately recognized as something important.”106

The web presented an increasingly difficult problem but also a promising opportunity for future historians. Brand later noted that with the web, “preservation goes fractal: infinitely branched instead of centralized. Yet this leaves the question, Is the Net itself profoundly robust and immortal, or is it the most ephemeral digital artifact of all?”107 Optimistically, participants at Time & Bits began to focus on the web’s potential robustness—an inversion of the digital dark age. Could the web leverage collective wisdom to help preserve objects? Lanier gave an example of how online aficionados had been saving old video games, but of course, the propensity for links to break and servers to flicker offline made a strong case for fragility and thus the argument for active preservation.108

Bringing together the concept of the Long Now with the need to have preservation and access, Kahle proposed to the group an idea of “breaking the preservation into two parts: into access-oriented media—easier to read and write— and long-term, say 10,000 years, like a time capsule that’s really hard to write.”109 Yet, would even making all these copies and preserving them help when one thought along the lines of millennia rather than years? Lanier thoughtfully observed that so many assumptions were embedded in contemporary computing discourse: “When undergraduates come into computer science departments, they are told about the idea of a ‘file’ as it were a fact of nature, as if it were as fundamental and immutable as a proton. [Lanier] likes to point out that the idea of a file has become locked in place as an idea because of its use in systems. In fact, it is a human invention that resulted from decisions that might easily have gone another way. The first version of the Macintosh didn’t even have files.”110 These were big questions. How could one preserve culture for millennia, move beyond underlying assumptions, to move beyond a pragmatic discussion of emulation, migration, and file fixity toward the philosophical questions around what it is that a society of civilization leaves behind? These diverse comments came together at the final presentations.

“This is no way to run a civilization,” declared Brand in his opening remarks at the concluding public forum. “Brewster Kahle pointed out that one of the peculiar things about the ’Net is that it has no memory. It’s as if it’s now the main event for civilization? We’ve made our digital bet. Civilization now happens digitally. And it has no memory.”111 By memory, Kahle referred to the ephemerality of these digital sources, as opposed to papyrus and other print materials. Next to the stage where Brand hosted the event and called up all of the speakers to discuss their thoughts on the issue, stood sculptor Alan Rath’s art installation “World Wide Web, 1997: 2 Terabytes in 63 Inches”; a tower of four rack-mounted CRT monitors, which would display pages from “500,000 sites gathered and stored by Alexa Internet.” Archived web pages flickered behind the on-stage panelists. To introduce what was at stake, a screening of Into the Future was held immediately before the discussion.

As befitted the only active web archivist at the gathering (and one of the few in the world), Kahle was optimistic. “The first reaction,” Kahle explained, “tends to be, ‘Oh my God, it’s all going away.’ ” For an example of that perspective, he highlighted Into the Future. Yet Kahle offered more hope than the documentary did. “There’s also this twinkle that comes up, which is, now that this is in digital form, we can do fantastic new things that we were never able to do before, in terms of making sense of it all, collecting it, data mining it, moving it all forward.” Kahle argued that they needed to “try to preserve it all՛ but warned that “if we don’t adapt soon, we’ll go through a dark period.”112 Hillis, Kahle’s former colleague from his supercomputing days, then raised apocalyptic warnings of a “digital gap.” “The historians of the future will look back and there will actually be a little period of history around now where they really won’t have the information,” Hillis warned, emphasizing that it was “really the first time that the basic creations of a civilization are being stored on media that won’t last a lifetime.”113

While the conversation was intellectually diverse, the attendees were not. Apart from MacLean, one of the organizers, all the active participants were men. Hedstrom was a notable absence from the stage, given her work in this field, as were many other voices that could have been drawn from the field that we have seen in this chapter.

The other omission was that of future users: where were the historians? And, for that matter, where have they been up until this point in the chapter? Across the Dead Media Project, Documenting the Digital Age, Into the Future, and Time & Bits, historians were mythical constructs: imagined in an idyllic future state, poring over future websites, rather than the actual professionals working at that contemporary moment. Surely historians would be interested in this material, it was largely assumed, and would be able to enjoy these fruits of abundance. Discussed in theory, they were rarely present. Of course, it was true that historians would need to know about the present in the future, and to do so they would need access to these kinds of records. But what would that access look like? Just where were the historians? It turns out that the timing was simply not ideal, as by the mid-1990s, mainstream historians were at the nadir of their engagement with technology.

Historians in the Digital Wilderness

Despite the professional caricature of historians being uninterested in technology, historians have a long track record of engaging with digital information, including critical questions around digital preservation and access.114 Amongst the historical professionals, these early encounters were driven by a wave of digitally assisted historians in the 1960s and 1970s who had then been relatively central to the historical profession. Historian Robert Swierenga posited in a 1970 retrospective of the “computerized research” field that historians were then in the midst of a third wave of computational historians, following 1930s punch card users who sought to tabulate quantitative data such as land mortgage information, to the second wave of 1950s or 1960s scholars who used sophisticated machines to understand historical demography.115 The interdisciplinary journal Computers and the Humanities was established in 1966, and indeed, by 1970 the prospects for “computer-assisted historical projects” augured a wholescale transformation of the discipline.116 Would all historians become digital? The peak of this conversation happened in the 1970s and early 1980s. When digital archivists and librarians convened the first conversations on electronic records in the 1960s and 1970s, historians were prominent participants. It would make their absence during the web age even more notable.

An early encounter was the 1968 Conference on the National Archives and Statistical Research, held at the National Archives of the United States. Occurring during quantitative history’s apex, this conference convened historians, archivists, sociologists, and other scholars who were concerned not only with the use of digital records for historical research but also how to ensure that contemporary digital records would be preserved for the next generation. The issue at hand was the deluge of information accumulating at the archives. The “records since World War I far exceed in volume all earlier records in the National Archives,” noted Meyer H. Fishbein, head of Records Appraisal at the National Archives.117 James B. Rhoads, Archivist of the United States, warned of the “vast quantities of data” being accumulated by his institution.118 These dire predictions anticipated those that would come in the 1990s in both substance and style. Economists and demographers made strong cases for the need to preserve information in a machine-readable format, and they even raised the prospect for remote access to archival holdings.

If historians had been strongly represented in these conversations during the 1970s, however, this changed by the mid-1980s and 1990s. Mainstream historians retreated from quantitative, and thus computational, work. The controversy around the quantitative history of slavery Time on the Cross as well as arguable hubristic overreach (Le Roy Ladurie’s 1968 claim that “the historian of tomorrow will be a programmer, or he will not exist”) combined, with other factors, to lead to its relatively rapid decline.119 As Edward Higgs recalled, there emerged hostility amongst historians toward these quantitative, digital practitioners. “They were coming out with this sort of stuff saying, you know what, all historians are essentially out of date,” Higgs recalled. “And all this stuff got up people’s noses, something dreadful.” Besides, the cultural turn became prominent, leaving digital and quantitative historians sidelined and—a term no historian wants to be associated with—niche.120 Postmodernism was dominating conversations. As the stability of primary sources and their meanings was disintegrating, perhaps there was less emphasis paid toward new kinds of records.121

While what would later become the digital humanities was percolating in the background throughout this period, it would not compensate for this shift that was underway. It is somewhat surprising that there was not more overlap between the nascent field of the digital humanities and quantitative historians, given a shared interest in computers. This perhaps stemmed from the emphasis in the digital humanities toward computational literary studies and its attendant emphasis on marking up documents for analysis, an approach that did not easily scale with electronic records. In any case, the shift for historians away from quantitative methods and toward more traditional social history meant that historians became increasingly disconnected from pathbreaking digital projects.122 As networked communication arose in the 1980s, and the web by the early 1990s, the timing was terrible.

Those historians working in the world of electronic records were among the first to notice the change that was happening around them, as their archivist and librarian colleagues became alarmed about these new records. Higgs recalls worrying about the longevity of email records, in a context where the Public Record Office would become involved only twenty-five years after the creation of a document: “Now, people are not going to be hanging on to things like emails for 25 years. And that was the sort of thing that was concerning me and other people at that time.”123

This concern began to percolate into a little bit of historical scholarship. One of the first explorations of historians and born-digital records came in a groundbreaking special issue of the journal History and Computing on electronic records. Edited by R. J. Morris, the noted social historian and expert on class formation in nineteenth-century England, the issue looked ahead to what the social and economic records of the 1990s would look like to a historian in fifty or sixty years. Many of the social and economic historians used government records and thus were attuned to the shift happening within governments as these institutions shifted toward electronic records. This special issue, “Back to the Future: Historians and the Electronically Created Record,” appeared in 1992. The editor’s introduction explained the problem at hand: “When did you first hand somebody a text on a disk or send it by e-mail? Almost certainly this did not represent a sharp break in continuity in your practice as a historian. The implications of the changes taking place in the nature of the historical record needs to be assessed by historians even before that process of change is complete.”124 The editorial expressed fears that historians might look back to the two centuries between 1750 and 1950 as the “golden age of paper based history, with few telephones and almost no computers.”125 The problem of digital preservation was well explained from the historian’s point of view, as the editorial pondered whether a policy historian, for example, should “at least have the ability to experience the data as it was experienced by the historical actors at the time?” This might require the use of “preservation or reconstruction of the main frames of the 1960s, or . . . software which simulates SPSS version one running on a KDF 9.”126 Similarly, document types were changing, leading to questions around just what a document was and, relatedly, what should be preserved. For 1992, this was remarkable language in a historical journal. Yet of the four articles, only one was written by a historian. The editorial lamented this absence, noting that “it is clear that this debate will remain incomplete without a great participation from practising historians.”127

The historical contribution, “Virtual Records and Real History,” by Ronald W. Zweig was an important one. Zweig was a political and diplomatic historian, bringing a different perspective than an economic or social historian exploring tabular data. As Zweig noted, political historians until the 1990s had more or less not needed to consider electronic records. “This situation is quickly changing,” he argued, “as the first machine-readable textual records deposited in archives are being opened to research.”128 This would bring new challenges. The “guardianship of office records” was shifting toward IT personnel, who “are not known for sentimentality or their interest in records that they have never seen or handled,” a problem compounded by documents that could “contain links and pointers to many other (interlinked) files of ‘documents’ so that the hypertext links are part of the information that the document contains.”129 Zweig expressed these worries but ultimately remained optimistic, noting that having digitized documents would open new frontiers. “Computerized records will make it possible to use sophisticated search and retrieval techniques,” he prophesied. While mere keyword searching would produce too many results, “if they are combined with an understanding of linguistic equivalences, proximity and Boolean searches, and other techniques used in text retrieval, it will be possible to control the results of a search and to improve its quality.”130 This was an early contribution to what would later become the field of computational history.

The following year, in 1993, the edited collection Electronic Information Resources and Historians was published. Coedited by Seamus Ross of the British Academy, and Higgs, the collection grew out of a June 1993 conference.131 Hedstrom recalled the event as opening “the door to a whole bunch of international things” critical to building community and fostering international engagement.132

Echoing themes from History and Computing’s special issue, Ross opened the collection with the thoughtful “Historians, Machine-Readable Information, and the Past’s Future.” “Awareness among historians of the changing character of contemporary information resources is limited,” Ross noted. Paper records were giving way to electronic ones not only within government but also in commercial operations and even in the consumer realm. Ross was hopeful: “The sheer quantity, diversity, and rich quality of the electronic information resources . . . would seem to indicate that the preservation of the information in electronic form could provide historians with a better opportunity to understand our period than the paper records alone could ever do.”133 His vision was prescient. This “age of electronic records,” as Ross understood it, could swamp “future historians with vast amounts of digital information [that might] impede their research as they attempt to navigate through it.”134

These fears were echoed by historians contributing to the edited collection. Kevin Schürer, a demographic historian who was also assistant director of the British Economic and Social Research Council Data Archive, noted that the long tradition of historians and computers needed to be considered. “Consequently, given current trends in computer-usage, surely it is not all some pervasive technophobia that has caused historians to start sounding the alarm bells in warning,” Schürer wrote. He further encouraged historians to “learn the skills required or suffer the consequences,” but he also noted that this meant more than learning to code.135 What would be needed was an approach to understanding computing that would be more akin to paleography. Perhaps even “technological advances allowing the ‘reconstruction’ of otherwise obsolete software” would be possible, opening the door for a technological solution akin to DNA analysis or radio-carbon dating in other fields.136 This utopian approach to technology stood opposite the apocalyptic rhetoric of the digital dark age. Perhaps a golden age could dawn after all.

Personal electronic communication was still young in 1993. Accordingly, most of the discussions focused on government and commercial records. In a companion piece, however, Schürer raised the prospect of “the diarist, novelist or would-be intellectual sitting at home with [their] word processor.” How could a biographer understand them if only the final product was deposited in an archive?137 Other issues including hypertext and context complicated the matter further, and Schürer’s stressing of the context of its creation was an important one.

Morris offset the optimism with a less sanguine perspective: “Last week (June 1993) I brought home a letter from my daughter. It had been sent by e-mail, transferred to a 3.5″ floppy disk and as a source of information was useless without specific software and hardware. The medium was no longer the message. Access needed a technologically sophisticated method of intervention. It was no longer enough just to know how to read.”138 Email was key. “The age of the network,” Morris noted, would be “by far the most imposing of the problems faced . . . it is not clear we even have the intellectual concepts needed to talk about the issues we faced. The meaning of simple ideas like document, text and context, of provenance and sequence fall slowly and inelegantly apart.”139 Answers were not yet there but could be found through historians becoming aware of contemporary information issues, archivists working with institutions at the moment of record creation rather than thirty years in the future, and more attention to the internet.140

By 1994, then, a small number of historians and information scholars had clearly realized and articulated how important preserving web and network-based material would be for future research. They were forward looking. At the time, the web’s dominance was not assured. Few were on the web, and competing internet protocols like WAIS or Gopher could still have eclipsed the web as the main way in which people would navigate the global network.

In April 1994, a conference hosted in Hampshire, United Kingdom, tackled the impact that computer networking would have on the humanities. It was a wide-ranging event, addressing topics as varied as preservation, access, digitization, electronic publishing, and organizational impacts. In his conference introduction, Seamus Ross introduced problems of digital archiving and preservation in one of the first-recorded reflections on the difficulties of web archiving to come: “How will networked communications and scholarship be archived? Who should have access to the archive? What levels of documentation should be retained and how should it be generated? What standards of data encoding, compression, and storage media should be used? Who will finance the preservation? What criteria for selection will be used? . . . Are email messages more akin to oral communication than textual sources?”141 Other presentations contemplated potential solutions to these overarching questions. Sir Anthony Kenny, chairman of the British Library’s Board, presciently noted in his keynote address the need to expand legal deposit regimes to encompass electronic material.142 Hedstrom later stressed the importance of archivists thinking expansively: “As more individuals, informal work groups, and ‘virtual’ communities use networks to communicate, carry on discussions, and conduct business, archivists will need to understand these forms of communication as well as they understand the use of electronic systems in more traditional organizations.”143 Yet historians were still absent. Looking backward in a 1998 essay, Ross would accurately note that “awareness among historians of the changing character of contemporary information resources has until very recently been limited.”144 Indeed, after the initial flurry around History and Computing, historians seemed to disappear from the conversation as quickly as they appeared.

This rapid crescendo and subsequent wane in the conversation amongst historians around electronic records, primarily in the years between 1993 and 1995, is a bit surprising. Higgs ruminated on why it might have been the case that scholars tended to “come together, [do] a bit of networking, and then [they] tend to dissipate and people drift off into other things.” He wondered if that was perhaps a combination of boredom, but also more importantly, the niche nature of this field of work. “I wonder if it’s such a niche thing [that] people didn’t really get promoted?” Higgs speculated to me, adding that “so it didn’t fit into intellectual structures, and it didn’t necessarily fit into career paths . . . [history] is a profession, and it is a career structure. And, you know, you don’t get very far from being a niche player.”145

Given the niche nature of this scholarship, for many historians these issues would only come to the forefront with American digital historian Roy Rosenzweig’s June 2003 American Historical Review article “Scarcity or Abundance? Preserving the Past in a Digital Era.” Given Rosenzweig’s importance in the field, it is worth exploring his approach in some depth. Within the North American, English-language historical profession, the American Historical Review is the top-tier flagship journal, with articles enjoying wide professional readership. For many historians, Rosenzweig’s article would be their introduction to the conceptual flood of born-digital resources that accompanied the web, as well as an introduction to the broader transformation of electronic records.146 The American Historical Review both published Rosenzweig’s article and—due to the article’s foreseen significance—hosted an online discussion where readers could discuss the article with him and amongst themselves.

Rosenzweig’s significance to the digital history field more generally and born-digital records and historical scholarship more specifically is indisputable. Rosenzweig, an urban and American historian at the forefront both of scholarly fields and of ways to leverage technology to reach new and expanding historical audiences, founded the George Mason University’s Center for History and New Media (CHNM). CHNM was the pioneering home of much of digital history’s new wave of scholars in the late 1990s and early 2000s. Established in 1994, CHNM has the mission of using “digital media and computer technology to democratize history: to incorporate multiple voices, reach diverse audiences, and encourage popular participation in presenting and preserving the past.”147 It remains a significant hub of activity today: developing the citation manager system Zotero, pioneering new publishing platforms, and training and fostering an entire generation of digital historians. Although Rosenzweig died in 2007, his legacy lives on in the now-renamed Roy Rosenzweig Center for History and New Media.148

“Scarcity or Abundance” was Rosenzweig’s contribution to the digital records field. He introduced the problem, summarizing the debates and discussions that had taken place, from archival conversations, Time & Bits, and Into the Future, before bemoaning the lack of interest from historians. Rosenzweig posited that, in part, the “detachment stems from the assumption that these are ‘technical’ problems, which are outside the purview of scholars in the humanities and social sciences. Yet the more important and difficult issues about digital preservation are social, cultural, economic, political, and legal—issues that humanists should excel at.”149 The traditional archival system would break down, he worried, especially in private collections where “preservation cannot begin twenty-five years after the fact.” What if a writer’s heirs found a “pile of unreadable 5¼″ floppy disks with copies of letters and poems written in WordStar for the CP/M operating system or one of the more than fifty now-forgotten word-processing programs used in the late 1980s”?150 Rosenzweig also provided an overview of the Internet Archive, noting its scope and the attending issues of long-term sustainability for a private archive supported at least in part by a philanthropic millionaire.151 Despite these challenges, he was hopeful, noting that “Kahle’s vision of cultural and historical abundance merges the traditional democratic vision of the public library with the resources of the research library and the national archive.” After all, despite the scope of national and research libraries, on-site physical access was necessarily restricted only to those who could physically make it into the reading room.152 He concluded by calling not only for better technical skills amongst historians, but crucially, for collaboration between archivists and historians. The time was now: “If the past is to have an abundant future,” he noted, “historians need to act in the present.”153

Accompanying the essay was the online discussion, which ran for the first two weeks of September 2003 on the American Historical Review website.154 In one of the many sad ironies of digital preservation, the electronic discussion was lost and not integrated with the long-term journal record itself, but it was preserved by the Internet Archive.155 While it was a small conversation of only a dozen authors and twenty-two posts, this both reflected the start of the academic teaching term (early September is not an ideal time for any discussion) as well as the lack of digital engagement by many historians.156 Discussions included what could tangibly be done, the degree to which historians needed to acquire archival training, the need to reduce professional divides, and insightful points around the degree to which digital information loss was any different from the amount of information lost during all periods. An active participant, Rosenzweig provided tangible avenues to foster collaboration through professional organizations, and he stressed that, while earlier information might end up in an archive through neglect, “the difference with digital data is that it appears if we wait twenty-five years, it may be too late—we could have nothing rather than, say, 10 percent of the data.”157

Historians played a complicated role when considering their impact on the field. An early generation of historians, such as R. J. Morris, helped initially bring the conversation forward on how they would be impacted. Yet then, between the early to mid-1990s and the early 2000s, historians disappeared from the scene. Part of this reflected the Anglo-American divide in the historical profession, perhaps, with much of the leading conversations taking place in the United Kingdom, where North American scholars were perhaps less involved. It also may reflect the early burst of digital history in North America, which was focused on public history topics, a force that would not fully join with web archiving until the terrorist attacks of September 2001.

Conclusion

By 1997, there was a widespread, if elite, cultural consensus in favor of digital preservation. From Sterling’s Dead Media Project to Time & Bits to historians like Rosenzweig, there was an increasing understanding that digital preservation could not simply be the purview of governments and corporate librarians but rather would require large-scale, interdisciplinary collaboration to make possible. On the one hand, the digital dark age raised apocalyptic visions of a sundered historical record. Yet, for others, inspired by the spirit of technological utopianism, there arose a prospect that the digital dark age might give rise to a digital golden age: a historical record unlike any that the world had ever seen.

With some exceptions, the individuals and organizations discussed in this chapter represented a more elite perspective: technologists, historians, and media theorists. Individual users, facing a 404 error alone on their home computer or the prospect of a lost document, largely did not feature in these conversations. Yet perhaps on reflection that is unsurprising. Everyday users would experience digital loss on an individual basis. The thinkers and writers discussed in this chapter helped to reconceptualize the problem as a much larger social one. It would not just be the individual trauma of losing some personal documents, recipes, or correspondence, it would be the collective imperilment of our cultural history. Put this way, it was not a matter of an individual tragedy but a looming digital dark age.

But so far, much of what we have seen in this chapter was theoretical: talking about building capacity, significance, networks, and occasional calls for action that were not always balanced by concrete, deliverable steps. The next step was to build the actual memory infrastructure of the web that could preserve it at scale and then provide access to it. During Documenting the Digital Age in early 1997, held in San Francisco, participants were asked “What now?” The Internet Archive had already begun to answer that question.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

ISBN	9781421450537
Related ISBN(s)	9781421450131, 9781421450148
MARC Record	Download
OCLC	1463796957
Launched on MUSE	2024-11-28
Language	English
Open Access	Yes
Creative Commons	CC-BY-NC-ND

Averting the Digital Dark Age: How Archivists, Librarians, and Technologists Built the Web a Memory

CHAPTER TWO

From Dark Age to Golden Age?

Meeting the Challenge of Digital Preservation:
The Challenge of Networked Information

The Dead Media Project

The Specter of a Digital Dark Age

“Save the Web”: Nathan Myhrvold and the Mainstreaming
of Web Archiving

Documenting the Digital Age: Historians and the Turning Point
of Web Preservation

Into the Future: The Conversation Goes Broadcast

“This Is No Way to Run a Civilization”:
The Conversation Comes Together at Time & Bits

Historians in the Digital Wilderness

Conclusion

Share

Purchase

Project MUSE Mission

Averting the Digital Dark Age: How Archivists, Librarians, and Technologists Built the Web a Memory

CHAPTER TWO

From Dark Age to Golden Age?

Meeting the Challenge of Digital Preservation: The Challenge of Networked Information

The Dead Media Project

The Specter of a Digital Dark Age

“Save the Web”: Nathan Myhrvold and the Mainstreamingof Web Archiving

Documenting the Digital Age: Historians and the Turning Pointof Web Preservation

Into the Future: The Conversation Goes Broadcast

“This Is No Way to Run a Civilization”:The Conversation Comes Together at Time & Bits

Historians in the Digital Wilderness

Conclusion

Share

Purchase

Meeting the Challenge of Digital Preservation:
The Challenge of Networked Information

“Save the Web”: Nathan Myhrvold and the Mainstreaming
of Web Archiving

Documenting the Digital Age: Historians and the Turning Point
of Web Preservation

“This Is No Way to Run a Civilization”:
The Conversation Comes Together at Time & Bits