Websites die constantly. The sheer size of the internet makes it feel like a permanent fixture, but individual pages only live an estimated 90 days before they change or vanish. At the same time, every single page has potential historical value. Maybe a future scholar will want to read a local news article that disappeared when the paper redesigned its website, or a political candidate is purging troublesome old statements. Perhaps someone will just want to revisit a video that made them laugh decades ago.
That anything (and everything) might someday prove valuable is why extensive internet archiving efforts exist. Those include the aptly named Internet Archive, a non-profit digital library that launched in 1996 with the humble mission of providing “universal access to all knowledge.” They’ve since digitized millions of books, videos, audio recordings, and software programs, while their Wayback Machine has saved snapshots of an estimated 544 billion webpages. Here, for example, is what the front page of Discover looked like on June 14, 2007.
The Wayback Machine is an incredible bulwark against websites that die slow deaths wrought from neglect, technological changes, mergers, and other ravages of time. But some websites have their plugs abruptly pulled, and that’s where the Archive Team steps in.
In Case of Emergency
Archive Team, a self-described “loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage,” is a volunteer organization that monitors fading or at-risk sites before they've vanished completely. When Google announced the end of failed social network Google+, the collective saved 1.56 petabytes of its data in under four weeks.
Much of what Archive Team saves is then stored within the Internet Archive, which anyone can use to digitize whatever they feel is important. But the Wayback Machine uses bots to crawl the web and take snapshots as they go, while the Archive Team is laser focused on preserving endangered sites. It’s the difference between slowly amassing a huge library and trying to save every book from a specific collection that’s about to catch fire. To accomplish this, anyone can donate bandwidth and hard drive space to the “Warrior,” an archiving application that systematically downloads sites the group is worried about. Those downloads are then sent to the Archive Team’s servers before being moved to the safety of the Internet Archive. The Warrior’s current projects include the soon-to-shutter Freewebs, a hosting service that’s housed 55 million webpages since 2001, as well as certain subreddits that have been quarantined, often the first step discussion website Reddit takes before deleting an entire forum. The content of conversations within those communities might help researchers understand how, for example, extremist viewpoints spread online.
The Archive Team also provides tips on how to manage your own data, and they encourage you to maintain your own backups. What if you had used MySpace to house precious photos of friends and family, then lost them all when their botched data migration in 2019 accidentally wiped out years of content? You shouldn’t rely solely on sites that could go under tomorrow, but if you make that mistake, well, that’s why groups like the Archive Team exists.
“It's less any particular websites than the stories behind them,” says Jason Scott, an archivist and spokesperson for Archive Team, who points to examples like widows being able to access their spouse's writings without their password or young mothers storing children's photos on sites that would otherwise disappear. “The human side of all these sites has been frequently forgotten, and we work to make sure they're just a little harder to forget.”
Looking Beyond the Moment
Seemingly iconic fixtures of the internet, like Yahoo Answers and GeoCities, are consigned to oblivion the moment they’re no longer capable of making their owners money. But archivists see the internet as both a living community and as a sprawling document of interest to the future. Maybe it’s silly to think of Yahoo Answers, a website that rarely got more intellectually stimulating than asking “how is babby formed” as having immense historical value, but that’s the point. That goofy question became one of the internet’s great memes; what else could be hiding within the site's depths?
“It's not clear at this point that people understand that the Internet wasn't always a phone-first, of-the-moment, celebrity and gossip and hatred goulash like it is now,” says Scott, explaining that many of the sites the Archive Team saves date back to simpler, earlier eras of the internet. “They may seem sad and static compared to how things are now, [but] they had a huge amount of heart and sense of possibility.”
Social media makes it easier than ever to think of the internet as existing in a perpetual now, where things lose value the moment they’re out of your sight. The Archive Team’s work is a reminder that it’s worthwhile to consider the internet’s past, as well as a potential future that cares more about people and their personal data.
“Most companies seem to think announcing the shutdown is the end of their attention, with just a small amount of time before they direct staff to shut things off and call it all a success ,” says Scott. “This feeling of helplessness by the users of these doomed systems is what drives us — to be one other possible situation besides confirmed doom.”