Machine learning for journalism: Examples and myths
Imagine supervising two dozen tirelessly eager, yet hopelessly naïve rookies. They can be trusted to work quickly and with great precision, especially on simple tasks. But sometimes their results don't make any sense. You can't really blame them, though. They're just doing their best with the examples you've provided, and they lack the journalistic experience—or life experience—that guides your own judgments. So it is when working with machine learning, according to John Keefe of Quartz AI studio. Keefe used this analogy to introduce the fundamental concepts and practical applications of machine learning to those of us attending his workshop at the Craig Newmark Graduate School of Journalism at City University New York.
Keefe said machine learning is useful for any journalist confronted with an insurmountable pile of data in any form: documents, images, videos, whatever. In these situations, machine learning can help you find needles in haystacks or insightful patterns that lead to enterprising story ideas.
At BuzzFeed, Peter Aldhous trained a computer to recognize the flight patterns of known spy planes operated by the FBI and the Department of Homeland Security. BuzzFeed then set their trained model loose on four months of flight data compiled by the website Flightradar24. As a result, they found US Marshalls hunting drug cartel associates and military contractors surveilling US cities. Check out their code here.
At Quartz, Jeremy Merrill and Natasha Frost used machine learning to point out the risk factors named in Lyft's IPO filing that are unique among the financial filings from all companies in the S&P 500.
At the Atlanta Journal-Constitution, a team of reporters collected over 100,000 disciplinary records from state medical boards and related regulatory agencies, a first of its kind national database. They then read a substantial subset of those reports, labeled which ones involved sexual misconduct by physicians, then trained algorithm to find other reports with similar keywords. Ultimately, the AJC's year-long investigation uncovered 450 cases of alleged sexual misconduct by doctors in 2016 and 2017. Read here to find out more about how the project came together.
At the Ukraine-based texty.org.ua, journalists used machine learning to shift through satellite imagery over a 30,000 square mile area of northwestern Ukraine in order to pinpoint places impacted by a rush of illegal amber mining. The ongoing environmental crisis has transformed hectares of forest and farmland into lifeless, moonlike desert. More on texty's methods here.
Keefe also debunked a few pervasive myths about machine learning:
- Myth: Machine learning only works on huge datasets.
- Truth: We can apply models already trained on huge datasets to datasets size of which journalists tend to be able to scrape together.
- Myth: Machine learning is a “black box” that makes it difficult to know what is happening.
- Truth: This myth serves the interest of powerful companies who want to sell us the black boxes. While there's a lot of computation happening behind the scenes, the math would feel familiar to anyone who's ever taking a calculus course, however long ago that may have been.
- Myth: You need a super-powerful computer to do machine learning.
- Truth: You just need a computer with a graphics processing unit (GPU), the likes of which you'll find on any high-end gaming or video-editing laptop. The cheaper route is to just rent the computer power you need when you need from either Google or Amazon.
- Myth: You need to advanced computer programming skills in order to do machine learning.
- Truth: Writing code isn't hardest part of machine learning, especially when you are working in a language like Python that offers great libraries that handle most of the arcane stuff.
To drive home this last point, Keefe introduced us to fast.ia an organization whose aim is to “make neural networks uncool again” by democratizing access to pre-trained models and research-based best practices. Their free online course online course is a great starting place for coders who want to have a go at machine learning,