Weihua Li is a data reporter for The Marshall Project. She uses analysis and visualization to report about the criminal justice system.

Weihua Li is a data reporter for The Marshall Project. She uses analysis and visualization to report about the criminal justice system.

The people behind the data points

Weihua Li of the Marshall Project balances the severity and humanity of incarceration through data

Miesner: What is your role at The Marshall Project? 

Li: I am a data reporter at The Marshall Project, which specifically means that I use data to tell stories and to explain what’s going on with the criminal justice system. I do a lot of data analysis, data visualization, as well as data reporting — which is talking specifically to the people who are behind the data points, to get a better understanding of what’s going on behind the stories.  

Miesner: Why are you passionate about data storytelling? 

Li: People don’t read numbers, they see numbers. If you show me two numbers next to one another, it will take me a moment to realize which one is larger, which one is smaller; if you put two bars in front of me, I can tell right away.  

I think there’s something special about data visualization and the way that the dataset wants to be told. There are special moments when a story will have many details, but there just isn’t a way to tell it without a visualization and piecing everything together is an incredibly fulfilling moment. 

Miesner: How does the issue you are reporting on inform the data visualization you are creating? 

Li: The datasets that I work with are usually about darkest moments of people’s life, whether you’re working with incarceration data, crime data, or police use of force; there is a severity to it — there is also a humanity to it, so it’s important to me, personally, to not lose sight of that and to not just see it as a number. 

Sometimes that does happen, you see a spreadsheet and you don’t think twice about it, but my colleagues and I, we always make an effort to read and report more behind the numbers so we know the severity of it, even though on paper it doesn’t look so serious. 

Miesner: Why do you think it’s important that we don’t lose focus of the people behind the data? 

Li: I think it’s important because journalism is different; our audience, the public, holds us accountable. We are not satisfied with just showing a stack of stats and calling it a day. Showing the disparity of that is not enough. What’s the real harm done? What’s the real story? 

If x number of people are put in prison for life because of this law, you need to do them justice by telling their stories. Especially covering criminal justice, this is really important because our data is usually about people whose voice is not often heard. It’s about people who are put into the system and then labeled very generically. We have to help tell their stories. Data analysis and data visualization are tools for me to tell their stories. 

Miesner: How do you work to show the severity of incarceration through data, do you have specific methods, ways of visualizing, things you always include or check? 

Li: I’m not sure if we’ve done enough of that through data and data visualization, but we have done something that our newsroom calls the carousels, which is essentially cards that we build out. Each card is one instance. We have this story about how Black girls are being disproportionately targeted in police use of force instances. 

In a study in New Orleans, every single instance on the receiving end of police force against girls was Black. So to showcase what that experience is like, data is not enough. One of my colleagues conducted dozens of interviews with Black girls talking about their experience and we built it into the article, with each girl being a slide, where people can see for themselves that it’s not just one instance, it’s recurring.

When police forces are collecting data, they aren’t looking to document the experience of the citizen. They are trying to see what happened and have any proof in the instance of a legal case.  What they are hoping to do and what we are hoping to do are very different, which means that the data they collect is not everything we are looking for. 

Miesner: How do you think journalists can strive to present data while keeping the viewer aware of and engaged with the human context of it? What do journalists need to do more of to achieve that?

Li: Many people come to The Marshall Project because they have been deeply touched by the criminal justice system or are just interested in the state of policing in general. So we have deeply reported stories where we look at people who are involved and are increasing our efforts in making engagement tools. Creating data releases to empower those that are being affected by the data. 

Miesner: What type of data analysis and visualization tools do you use?

Li: I mainly use Python for my data analysis. At this point we have a rig where we then query the S3 data through a program such as Observable to do a quick data analysis or visualization. Because The Marshall Project is a hugely collaborative newsroom, I love working with other people and I think that sharing what I am finding with my colleagues who don’t know how to code is very important. For quick visualizations with my colleagues, I tend to use D3, and for publication I typically use D3 and JavaScript. 

We also put these visualizations into an in-house graphics rig that we share with other publications who have a CMS (content management system) that isn’t as flexible as the custom one here at The Marshall Project

Miesner: If someone doesn’t have the time to learn Python, do you have any favorite tools you’d suggest for them? 

Li: I think Tableau is great. I think DataWrapper has a great low barrier of entry and that you can get very far without putting in much effort. Python is just a tool. It’s just something you use to accomplish your goal. 

If you look at my work, none of my data visualizations tend to be complicated or fancy, but I feel like you don’t need ‘fancy’ visualizations to tell a good story. Usually if your point is very sharp and your intent is clear, a basic line chart can get you pretty far.

It really is ambitious projects, like recently we launched the Testify project, where we dig into all the court data, millions of cases, and analyze that to hold judges accountable. What my colleagues found in the first story was that people who vote for judges in local elections are not the same people being put away by the judges. 

Miesner: Are there any particular pain points or positive aspects in your current data analysis and visualization tools? 

Li: There’s always a balance between “Is this good enough for publication?” and “Do I need to put more work into this?” 

In journalism, I think that the story or the chart is always as good as what you have the time for. Even though sometimes I look back at things I’ve done in the past and cringe, I know that in that timeframe and with my ability, that was the best I could do. 

Miesner: If you were giving advice to local newsrooms starting in data — what would it be?

Li: Data is not a scary thing; you can do a lot on your own, just using Excel. There are a lot of national newsrooms like the Marshall Project that work with local newsrooms often, so don’t hesitate to reach out. I love hearing from the court reporters and editors who have a dataset they want to do something with and want to work together. I see it as part of our mission, to really advance that and make sure that criminal justice issues are being covered deeply on a local level. 

Miesner: What do you think is the most powerful thing in data visualization? 

Li: Good storytelling. In our industry, I feel that there is this tendency to put journalism into different buckets, but I think at the end of the day, we are all storytellers. We try to make sure that we use the right tools, we use the right format, we tell the story. I have gone through phases where I wanted to experiment with different fancy libraries for different types of charts to see what I can do to show off my skills, but I couldn’t help but notice that some people will be excluded if you do this. So I think that it’s really important to be clear and concise with our visualizations. 

Miesner: What is a project you contributed to that you felt the use of data was innovative or different? Why? 

Li: There’s a few things I’ve been experimenting with in my stories. There’s one story that we published using the 2020 census data to look at incarcerated trends in different counties. In another story, we used FBI data to see how many crimes are being solved by police departments. With both stories, we have built interactive tables where people can find their own state, their own county and find data relevant to them. 

We also built in data documentation to explain what the data is that they are looking at. This is the line of projects that I am really proud of and want to do more. This is data that demonstrates trends in places that you may be interested in. This is something you can take and empower yourself in whatever pursuits you wish. 

Miesner: Have you found that the ability to narrow down the data to areas that are relevant to readers as well as being able to interact with the data is particularly helpful to readers? 

Li: Stories I’ve worked on with documentation for the data are relatively new, there’s only two stories like this so far. I have had many people reach out and tell me that they really appreciate the article and that they are using it in a class they are teaching. 

I hope that for every one person reaching out there are five people who found value in it. We do need to track this better, though. That’s the challenge. This type of project should be interactive at its core. The more people interacting with it, the more people who will know about it and find value in it. 

I also want to say that there are a lot of groups for people who have had their family members incarcerated and I think it’s important for them to recognize that they are not alone. There are a lot of people who feel the same as them. 

Miesner: While innovation can be very exciting, what are some of the scary or negative aspects with this innovation? 

Li: The negative aspect is that we don’t know if it works. At the scale of The Marshall Project we can’t really run a hundred stories and see what works and what doesn’t. Each story has to count, each story has to be able to achieve something to give us the ability to keep doing it. 

I think the exciting part is that I hope we are creating a new genre of journalism. This is doing a service for our reader and this is a way for us to really put the tools and the infrastructure we have access to and really add to the greater good. Not every newsroom has the tools we do and the tools come with a lot of power. Not every newsroom of our size has the design and data tools that we have. It’s really exciting that we get to do something like this. 

Miesner: Where do you see data journalism going? What type of future innovations do you see? 

Li: I wish I knew the answer to that. I do think that accessibility will definitely be something we really focus on moving forward. Especially with data visualization, focusing on people who cannot see the differences in colors or see at all are not able to experience the article the same way and how can we make that experience as close as possible for them. 

And not everyone is literate, but that doesn’t mean that they shouldn’t have access to the information in the story. That’s something that we definitely want to get better at. That’s absolutely a place that we need improvement in because we owe it to our readers. 

Editor’s note: This interview has been edited for length and clarity by Mikaela Rodenbaugh and Kat Duncan. 

Related Stories

Expand All Collapse All
Comments