Guardians of the Deleted: The Wayback Machine’s Quiet War for the Public Record
Good evening, everyone:
Amy sent me an absolutely fascinating article from National Public Radio about where data goes when nobody wants it – or, more precisely, when someone wants to be rid of it. Reported on the radio by Emma Bowman, the piece is entitled, “As the Trump administration purges web pages, this group is rushing to save them.” Bottom line: The Internet Archive Office in San Francisco has catalogued some 73,000 pages that lived on U.S. government websites purged over the early months of the Trump Administration’s tenure. It’s been named the “Wayback Machine.”
The nonprofit, founded in 1996, is a digital library of internet sites and cultural artifacts. This includes hundreds of billions of copies of government websites, news articles, and other forms of data. The Wayback Machine is the archive's access point to nearly three decades of web history. It’s rather surprising location is not in a warehouse of servers in South Dakota, but in an old Christian Science church near the Golden Gate Bridge.

When one enters, you see three imposing vertical stacks of servers that are the heart of the Wayback Machine:

Here, with the director, Mark Graham
The servers are recording the World Wide Web in real time. Every day – and you can check with Wendy M. to verify just how big this is –100 terabytes of material are uploaded – that’s approximately one billion URLs. The vast majority of that material is fed into the Wayback Machine; the rest is digitized analog media — books, magazines, television, radio, memoirs by obscure foundation presidents, and the like.
All of this positions the Wayback Machine fortuitously in the current Orwellian/Fahrenheit 454/ “Silo” environment. Want to know what executive left out which critical phrases about climate change, gender preference, or diversity? The Wayback Machine can tell you. Want to know what historical records were completely erased? Same. Want to understand how current regulatory guidances compare with those of the Biden Administration? Same. Want to see a copy of the interactive timeline prepared by the January 6th Commission? Same.
Understandably, therefore, the Archive is drawing dramatically expanded interest these days. Interestingly, some are trying to upload data before it disappears. And the Archive itself is deeply interested in getting out in front of potential purges. Reporter Bowman notes a particularly compelling case of direct interest to Monica and our Health team:
Nancy Krieger, a social epidemiologist at Harvard University . . . has teamed up with other scientists to try to preserve federal health data that has recently disappeared from government websites. She helped develop a list of terms to send to the Internet Archive to aid the search and preservation effort. ‘We want to preserve public health data that are crucial for people's well-being,’ she told NPR.
For example, she noted, there's a web page on the Centers for Disease Control and Prevention's site titled "Ending Gender-Based Violence." It highlights CDC research showing that adolescent girls and young women bear a disproportionate burden of HIV cases worldwide, an issue driven by gender-based violence and poor access to health services. The page, which was accessible on Jan. 16 prior to Trump's inauguration, now reads "page not found."
The good news is that the bulk of the Wayback Machine, and the broader Archive is available almost immediately to the public. Most of what the Internet Archive slurps into the Wayback Machine becomes available to the public with minimal delay. But their job has become harder and harder. And who knows what additional scrutiny it will attract, and with what consequence.
The not-so-good news is that the web is filled with severed links – the dreaded “404 Page Not Found” message. Indeed, almost 40% of the web pages in existence a decade ago are still accessible. That’s a loss of a staggering amount of source material. So, the Wayback Machine is seeking to – in the words of the Archive’s director – “build our culture on shifting sands.”
But despite the hazards and limitations, what a treasure that I suspect very few of us knew about. An important cog in our civic machinery of democracy.
Rip