Gotta catch ’em all

Gotta catch ’em all: Archiving digital content such as social media should include linked objects

In the digital media world, there’s no guarantee that material that appears one minute will be there the next.

Take, for example, Twitter posts of Anthony Scaramucci, who began deleting old tweets after being named White House communications director. Many — but not all — of the former presidential aide’s postings had already been identified and collected by the diligent folks at the Internet Archive’s Wayback Machine. Unfortunately, even some of the linked content in those tweets was not captured, resulting in an error message instead of the original digital object such as a video.

An archived Twitter page shows Anthony Scaramucci’s erased tweets. Linked digital objects that were not archived do not show up in the Wayback Machine’s rendering.
An archived Twitter page shows Anthony Scaramucci’s erased tweets. Linked digital objects that were not archived do not show up in the Wayback Machine’s rendering.

For journalists who want to document comments, photos, videos or other materials, the temporary nature of online content presents a daunting barrier to providing digital evidence.

In the fast-moving and easily edited environment of social media, the problem is exacerbated because content can be removed by the person posting on, say, Twitter. Data scientist Kalev Leetaru estimates Twitter volume at 477 million tweets per day.

Leetaru’s examination of tweets kept in the Wayback Machine indicates that even at one of the most advanced web archives, “as little as 30 percent of those links are currently being preserved.

“Preserving the JSON of a tweet without resolving its short links back to their original forms is no different than archiving a webpage’s HTML without also preserving the images and other content it embeds.”

What are the options for a reporter who wants to capture a collection of online content and later display them as they originally looked?

Easy options include taking screenshots, making printouts, exporting the pages as a PDF or entering a URL into the Internet Archive’s Save Page Now function. But none provide a way to replicate the original experience of exploring and interacting with live linked content, including media and interactive objects.

A free web archiving tool called Webrecorder may help change that situation. The browser-based application provides a quick and easy way to collect and preserve dynamic online information, including social media, videos and interactive content. One key feature is the tool allows the capture of almost anything the user scrolls to or clicks while it is recording. Webrecorder comes with up to 5 GB worth of free storage.

Besides keeping public officials honest, Webrecorder provides journalists with a means of keeping a record of officials’ digital output. Digital news content is subject to a host of factors that make it unlikely to remain accessible for more than a few years. Risks to digital content include changing content management systems, obsolete formats, hacking, hardware failures and evolving delivery systems.

Journalists who wish to keep an archive of their own work or someone else’s Twitter feed will find Webrecorder useful, especially for preserving a visually accurate record of a website. Using the application is fairly straightforward: Turn on the recording mode, then scroll and click your way around the browser window. If you see a link, click it and follow it as far down the rabbit hole as you deem necessary. Anything you do not scroll to or click will not be recorded and will not be accessible. What you see and what you saw is what you will get later during playback.

There is no time limit for recordings. The only limit is the amount of space available for the captured content. The files, which are stored in WARC — a standard web archive format — can be organized into custom collections on the Webrecorder site.

Rhizome, the organization behind Webrecorder, recently released a new downloadable tool that allows users to playback their recordings from their desktop. This offers the possibility of keeping multiple copies of WARC files in more than one location, thus helping avoid the risks associated with keeping only one copy of an important resource.

Webrecorder began as a project to preserve born-digital artworks, including online installations. Rhizome, a 20-year-old nonprofit based in New York, received a $200,000 grant from the Knight Foundation in 2016 to expand the Webrecorder project.


Comments are closed.