Dodging the Memory Hole 2016: Saving online news

Memory holes and permanent errors: Part 3

Day 3: Updates and large-scale edits

This is part three of a white paper, “Memory Holes and Permanent Errors,” which examines whether and how online news archives should preserve corrections, updates and other post-publication changes. Access part one and part two. Part four will be published Thursday, May 25, along with a PDF of the entire white paper.



Another type of article change to consider in news preservation is the update. Early on, editors realized that the fluidity inherent in the World Wide Web makes this an ideal medium for reporting breaking news. Stories can be updated to reflect information that was previously unavailable, or that was initially mistaken. Unfortunately, this fluidity poses a headache for archivists, who are left to grapple with the question of which is the “canonical” version of the story.

For example, a change log of the New York Times’ initial coverage of the Newtown massacre in Connecticut, compiled by NewsDiffs, records 19 versions of the story between 12:09 p.m. on December 14, and 10:12 p.m. on December 17, 2012. Additions included the number of fatalities, identification of the perpetrator, eyewitness accounts, reporter observations from the scene, information from police briefings, reactions from political leaders and contextual information (such as “among the worst mass killings in United States history”). Other changes included tweaks to the headline and deletion of no-longer relevant phrases (“those reports could not be immediately confirmed”; NewsDiffs, 2012). ProQuest carries the final text though it gives a December 14 dateline (Barron, 2012).

It may be unnecessary for most casual readers, but students and researchers in media studies would likely benefit from seeing how breaking news stories are reported in their earliest stages and how those stories then evolve. In the future, we will need to understand “not just what happened, but how the processes of journalism shaped and affected what happened.” (Clifford Lynch, Dodging the Memory Hole 2014 conference, cited in McCain, 2015). Therefore, key questions for update preservation include:

  1. Should we save any story versions that predate the final version?
  2. How many and what versions should be saved?
  3. How should different versions be displayed? How should changes be noted?
  4. Inevitably, when covering a breaking news situation, some reported information will turn out to be false. How should this be handled — the same as any other update? As an error? Or as some kind of hybrid?


Publishers describe online story updates as a continual practice. “A huge percentage of the articles that we publish online get revised, and updated, and improved, and expanded, and re-edited, during the course of the day,” The New York Times’ Corbett says. “That’s not a rare thing. That’s what we do. That’s a constant thing.” The motivation is to give the reader the most accurate and complete picture possible at each point in time, so Corbett says he has little incentive to flag up non-correction changes or to give readers the option of reading different versions. “Honestly, I think in the overwhelming percentage of cases, for the overwhelming majority of readers, they don’t care. They come at 3 o’clock in the afternoon, they want to see what can The New York Times tell me about this topic at 3 o’clock in the afternoon. They really don’t care what we told readers about that topic at 10 o’clock in the morning. They don’t care what additional quotes have been added, what other quotes have been taken out to make room for the new quotes, what’s been moved up, what’s been moved down.”

The Los Angeles Times take a different approach, adding an “Update” line with a timestamp to notify readers when an article on has been changed – for example, to reflect new information on the number of injured in an accident, or to add a quote. At the same time, Los Angeles Times library director Cary Schneider questions if these changes really need to be documented in archives. “I think most people want to know the facts, and they really don’t care, for example, how a death toll changed every few minutes, as reported by one paper,” he says. And he foresees another problem with creating versioned archives. “If there’s a correction that has to run, it has to be attached to all those versions. All of those versions would be replaced with a correction on them,” he says.

But the fluidity of online news presents a hurdle for archivists, argues Hjalmar Gislason, vice president of data for analytic software company Qlik, and chairman at Icelandic media company Kjarninn. “It almost puts the archivist in the situation where you need to first of all acknowledge the fact that you are storing just one version of a story, and it’s not 100% coverage… To properly archive, the only way to do it today is to frequently scan all of the articles.” Gislason says the lack of multi-version archiving is a problem because the changes publications make can be misleading (as discussed in “Large-scale edits,” below), but also because it’s undesirable for different readers to disagree on what the story actually said – and for them all to be right. “From the archivists’ and readers’ side, it’s more that somebody may refer to a version of the article that was different from what somebody else saw. It might be very confusing, and reflect badly on the person referring to it.”

Gislason would like to see news preservation apply something like the NewsDiffs approach: revisiting publications at different intervals, and showing readers what’s changed. But he acknowledges that this will take more resources than memory institutions currently have at their disposal. “Many of them are struggling to get to the point where they can archive [online news] at all,” Gislason says.

The Los Angeles Times’ Ben Welsh says that creating versioned archives of news stories requires surmounting several barriers. First, the news organization actually has to save multiple versions of its stories, which is not necessarily the case, currently. Both the Los Angeles Times and The New York Times save multiple versions in their CMSs; but publishers also have to anticipate frequent changes to CMS systems and ensure all articles are saved in a standardized, long-lasting format. Second, the publisher must decide what sort of user interface it wants, and third, decide how code will surface various versions from the publisher’s database. Finally, Welsh says, “You’d have to have the will do it — which is often the most difficult thing.”

NewsDiffs offers one potential model for a user interface, Welsh says. In a similar vein, the Twitter account @NYT_diff tweets a snapshot of the changes made when an article is updated. These projects work independently of publishers, an approach that has advantages and disadvantages. On the one hand, third parties can act as neutral watchdogs; on the other hand, their resources are limited. A news outlet that saves all past versions of a story knows how many there are, so it could theoretically share them all with the public. A third-party service like NewsDiffs can only crawl the news site at intervals, and may not pick up on every change.

The Memento protocol created by Herbert Van de Sompel and Robert Sanderson of the Los Alamos National Laboratory, together with Michael Nelson of Old Dominion University, offers another potential way to surface changes. When a user visits a webpage supporting the Memento framework, he can see past versions of the site by specifying a date and time. If newsroom CMSs had Memento built in, they could receive and respond to these requests. In fact, Welsh has built a Memento plug-in for Word Press and says it wouldn’t be difficult to build the protocol into newsroom CMSs.

But Welsh notes that such a system would be “content neutral.” Simply showing readers that an article has changed doesn’t alert them to whether the change was due to an update, correction, or something else. Making those distinctions might require the type of standardized metadata discussed in the Correction section, above.

Large-scale edits


Finally, we should consider cases where news outlets make significant changes to a piece, for reasons other than errors or breaking news. One such situation that has received particular attention is “stealth editing”: when a news outlet fails to alert readers to the changes made. The changes that come in for the most criticism tend to be those that alter the tone or meaning of a story.

Famous examples include The New York Times’ changes to an article on Bernie Sanders, making the piece more skeptical of the candidate’s prospects, and more measured in its praise. Hundreds of readers expressed disapproval of the change (Sullivan, 2016). In another example, The New York Times was widely accused of sexism when it began the obituary of a scientific pioneer thus: “She made a mean beef stroganoff, followed her husband from job to job and took eight years off from work to raise three children” (NewsDiffs, 2013). It later moved the words “brilliant rocket scientist” from the second paragraph to the first, replacing the beef stroganoff.

The New York Times has been particularly prominent in these discussions, perhaps because of its reach and prestige, and perhaps because of the frankness of its public editors, who have disagreed with these unmarked changes. But other outlets do it too. For example, Politico was criticized when it deleted paragraphs critiquing the media-handling skills of the then-commander of U.S. forces in Afghanistan, General Stanley McChrystal (Hendler, 2010). It’s difficult to say how often these changes happen, however, or which outlets do it most frequently. NewsDiffs finds hundreds of changes per day in its five target outlets, but most are minor tweaks or updates. The prevalence of “stealth” editing is, as the name implies, unknown.

The reasons news outlets give for stealth edits (when they are forced to give a reason) vary. The organization might say the article was not balanced, lacked context or more generally “did not meet our standards.” It might argue it has an obligation to improve the grammar, flow, tone and so on, in the same way as it would for an unpublished piece. Critics often charge more insidious rationale, such as the paper bowing to pressure from politicians or advertisers. Such charges are difficult to prove, and the news outlets in question usually deny these motivations.

Key questions about the role of large-scale edits in online news preservation include:

  1. Should we save any story versions that predate the final version?
  2. How many and what versions should be saved?
  3. How should different versions be displayed? How should changes be noted?


Here, the question for preservation is less the potential of archives to spread misinformation and more how much transparency we should be demanding from news organizations. Current New York Times public editor Liz Spayd says readers are “far more sophisticated than they’re given credit for” — they do notice changes, and they want these explained. Giving them those answers “conveys openness and reduces suspicion,” Spayd says. She compares substantive unflagged changes to “a doctor who sits a patient down to discuss troubling test results, then later rethinks that analysis and mails the patient a more accurate interpretation of the results, without highlighting the changes” (Spayd, 2016b). Her predecessor Margaret Sullivan argued that except for breaking news, “digital platforms … are not a test run,” so editors should take the time to get the pieces right before publication (Sullivan, 2016). Fortune’s Matthew Ingram (2016) sums up that point succinctly: “Editing stories after publication doesn’t build trust.”

A poignant example, albeit one that editors did draw attention to, appeared in The New York Times in July 2016. At first, online readers of the piece by Georgetown University professor Michael Eric Dyson found a sharp critique, describing “an undeclared war against blackness” (Spayd, 2016a). But then an African-American sniper in Dallas killed five police officers, four white, one Mexican-American, and the piece was radically changed. The undeclared war became “racial justice feels elusive.” Editors appended a brief note about the change, but some readers were angry. They thought the explanation was too brief and too vague and asked whether opinion pieces should ever be changed retroactively in this way. “And what about the ‘record,’ or perhaps the imprint, made by the originally published piece?” asked occasional Times freelancer Rand Richards Cooper. “Is it simply gone forever?” (Spayd, 2016a). It is indeed arguable that future historians, students and journalists would benefit from seeing both versions of the piece, especially given the national prominence of these issues and of the events in Dallas.

On the other hand, Corbett argues the impracticality of highlighting all changes for readers. “Both in technical terms and in terms of how the newsroom functions, and how readers read our material, I think to very frequently be putting notes onto stories saying this story has changed because we had this new information, or an editor decided that such and such a point was less important, so he moved it down in the story and moved this other point up instead — aside from other journalists, and media critics and academics, I’m not sure how many readers would really, truly benefit from that on a routine basis.” And he argues that if editors can improve a piece, they should. Margaret Sullivan concedes that if The New York Times put an editor’s note on every article it changed for non-error reasons, “Nearly every article would require an accompanying editor’s note about why, for example, a particular quotation was added or another removed, and so on. Or various versions would need to be available online” (Sullivan, 2013).

Many of the technical ideas and hurdles discussed in Updates, above, also apply to large-scale edits. But Qlik’s Hjalmar Gislason says the argument for versioned archiving is particularly strong when it comes to large-scale changes. When newspapers don’t own up to the need to make major overhauls to a piece, Gislason says, “That undermines that publication in my mind. They should be more transparent.” The incentives for documenting large-scale edits, however, can be very different from the incentives for documenting updates. When we know that publications often lack the will to show their readers how breaking news changed, how can we expect them to detail every single misstep in their reporting and editing?

Some might argue that this is exactly why third-party projects like NewsDiffs exist: to do the documentation that publishers won’t. Perhaps this is a task that memory institutions should take on and should seek funding for. But it could arguably be dangerous for librarians and archivists to set themselves up in this somewhat adversarial role.


Barron, J. (2012, December 14). Nation reels after gunman massacres 20 children at school in Connecticut. New York Times (online). Retrieved from

Hendler, C. (2010, June 23). A Politico graf goes missing. Columbia Journalism Review. Retrieved from

McCain, E. (2015). Plans to save born-digital news content examined. Newspaper Research Journal, 36(3), 337–347.

NewsDiffs. (2013). Comparing: Yvonne Brill, a pioneering rocket scientist, dies at 88. [Change log]. Retrieved from

NewsDiffs. (2012). Gunman massacres 20 children at school in Connecticut; 28 dead, including killer (NYT), Change log. Retrieved from

Spayd, L. (2016, July 12). Opinion editors tone down piece on race, and some readers cry foul. New York Times. Retrieved from

Spayd, L. (2016, September 24). Taking the stealth out of editing. New York Times. Retrieved from

Sullivan, M. (2016, March 17). Were changes to Sanders article ‘stealth editing’? New York Times. Retrieved from

Sullivan, M. (2013, October 14). Is the Times being stealthy? Or just improving its reporting in real time? New York Times. Retrieved from

All references cited will be available in the final PDF download.


Comments are closed.