RJI Fellow working on solution to ‘future proof’ data-driven news applications

ProPublica’s Dollars for Docs is an elaborate data-driven news application that allows readers to interact and engage with it as they learn about the money doctors receive from pharmaceutical companies.

In 50 years, Dollars for Docs may be gone entirely if nothing is done to archive it, says Meredith Broussard, assistant professor at the Arthur L. Carter Journalism Institute at New York University and author of “Artificial Unintelligence: How Computers Misunderstand the World.”

Broussard is on a mission to preserve Dollars for Docs and other data journalism projects like it. As a computer scientist turned data journalist, Broussard says she’s uniquely positioned to address the issue of what University of Minnesota faculty members Kathleen Hansen and Nora Paul call “future-proofing the news.”

Thanks to organizations like the Internet Archive, there is technology to preserve static web pages that contain simple text and data. For more complex interactive sites like Dollars for Docs, the technology does not yet exist, she says.

“Ten years from now, we won’t be able to access some of the most exciting database-driven journalism produced today,” says Broussard.

Database-backed stories, also known as news applications or news apps, are complex because they have so many parts that require preserving together as a package.

“Today, I can read a whole print newspaper on any day from 1899, but I can’t read everything that The Boston Globe published in print and online for any day in 2007.”

“An easy way to think about it is that a news app has a front end that the public sees on the web, plus a back end where the computing happens and the data is stored,” says Broussard.

That’s why she’s developing a tool that combines front-end web crawling technology with back end preservation methods used in reproducible scientific research. For this project, which she’s tackling during a 2018-19 fellowship at the Donald W. Reynolds Journalism Institute, her team is focused on using ProPublica’s Dollars for Docs as a test case.

As someone who has written for the web since the mid-1990s, Broussard knows firsthand how easily work is lost online: her older work has largely disappeared.  Websites get shut down.  Companies lose content as they update their CMS or switch to a different one.  Technology becomes outdated.

“Everything online breaks,” she says. “I realized that in 20 years nobody is going to be able to see the amazing online journalism that’s being produced right now. That is going to be a big problem for future historians because history is based on the idea that you can look back and see artifacts from the media. More has disappeared online than most people imagine.

“Today, I can read a whole print newspaper on any day from 1899, but I can’t read everything that The Boston Globe published in print and online for any day in 2007.”

In addition to packaging up apps like Dollars for Docs, Broussard is also creating an electronic repository where the packages may be deposited. Saving one copy on a computer desktop is not good enough.

“Computer scientists refer to it as the truck problem,” she says. “As in, if you get hit by a truck and you have all this knowledge in your head, then the knowledge disappears.”

Newsrooms interested in archiving or preserving their complex or award-winning data journalism are encouraged to get in touch with Broussard via meredithbroussard.com.

She will present her work at the 2019 NICAR conference, which is March 7-10, along with her research partner Katy Boss, librarian for Journalism and Media, Culture, and Communication at NYU, and archiving activist Ben Welsh, data desk editor at the Los Angeles Times.


Comments are closed.