Scott now works as a “free archivist and software curator” for the Internet Archive, an online library started by internet pioneer Brewster Kahle in 1996 to preserve and archive information that might otherwise be lost.
As a society, we create so much new stuff that we always have to eliminate more stuff than we did the previous year.
Over the past 20 years, the Internet Archive has amassed a vast library of material collected from across the web, including GeoCities content. And it doesn't just preserve purely digital artifacts; it also has a vast collection of digitized books that it has scanned and rescued. Since its inception, the Internet Archive has collected over 145 petabytes of data, including over 95 million public media files, including movies, images, and text. It has successfully preserved almost 500,000 pages of MTV News.
The Wayback Machine, which allows users to rewind and see what a particular website looked like at any point in time, has stored over 800 billion web pages, with 650 million new ones captured every day, as well as recordings of TV channels from around the world and videos from TikTok and YouTube, all stored in multiple data centers owned by the Internet Archive.
It's a Sisyphus quagmire: As a society, we're creating so much new stuff that we always have to get rid of more stuff than we did the previous year, says Jack Cushman, director of the Harvard University Institute for Library Innovation, which helps libraries and technologists learn from each other. “You have to decide what to keep and what not to keep. And how do you decide that?”
Mike McQuade
Archivists must make such decisions all the time: Which TikToks, for example, should be preserved for posterity?
Niels Brugger, an internet researcher at Aarhus University in Denmark, says we shouldn't try too hard to imagine what future historians will be interested in about us. “We can't imagine what historians 30 years from now will want to study today, because we have no idea,” he says. “So we shouldn't try to anticipate or to some extent limit the questions that future historians will ask.”
Instead, Brugger says, they should preserve as much material as possible and let them figure it out later. “As a historian, I would definitely do this: Get everything, and then historians will figure out what they're going to do with it,” he says.
The Internet Archive prioritizes materials that are most at risk of being lost, says Jefferson Bailey, who helps develop archiving software for libraries and institutions. “We prioritize materials that are ephemeral, at risk of being lost, or that are not yet digitized and therefore vulnerable to destruction because they are in analog or print format,” he says.