Making Newsrooms More Data Friendly | Stats + Stories Episode 201 (from the RSS 2021 Conference) / by Stats Stories

Irineo Cabreros (@cabrerosic) is an associate statistician at the RAND Corporation. At RAND he has worked on projects in health care, education, fairness and equity, military personnel, substance use, incarceration, and insurance industries. He is a passionate science communicator who has written for Slate Magazine as an AAAS Mass Media Fellow. His research interests include causal inference, algorithmic equity, experimental design, survey sampling, high-dimensional statistics, latent variable modeling, and statistical genetics with his focuses areas including Labor Markets, Modeling and Simulation, Racial Equity and Survey Research Methodology among many others.

Episode Description

Newsrooms all over the world are embracing data journalism – looking for unique and thoughtful ways to use data to tell stories about their communities. But is every newsroom handling data as carefully as it should be? What safeguards are in place ensure journalists are using data in ethical ways? That’s the focus of this episode of Stats and Stories with guest Irineo Cabreros.

+Full Transcript

Pennington
Newsrooms all over the world are embracing data journalism looking for unique and thoughtful ways to use data to tell stories about their communities. But is every newsroom handling data as carefully as it should be? That's the focus of this episode of Stats and Stories coming to you from the annual meeting of the Royal Statistical society. I’m Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film as well as the American Statistical Association. Joining me as panelists today are John Bailer, Chair of Miami statistics department, and Brian Tarran, editor of Significance magazine. Our guest today is Irineo Cabreros. Cabreros is an associate statistician at the RAND Corporation or he's worked on projects in the healthcare, education, fairness and equity, military personnel substance use incarceration, and insurance industries. He's a passionate science communicator who has written for Slate Magazine as a triple as mass media fellow. His research interests include various statistical methods and focus on such issues as labor markets, modeling and simulation, racial equity and survey research methodology. Cabreros recently wrote an essay for underdark about data journalism, suggesting precautions newsrooms should take to make sure they get things right. Irineo, thank you so much for joining us today.

Cabreros
Thanks for having me.

Pennington
I guess, just to start the conversation, could you talk about what drove you to write this particular essay?

Cabreros
Right, so I was in 2018, I was asking Fellow at Slate. And basically what that fellowship does is it pairs graduate students with mass media outlets. And so it really throws you into the deep end games, graduate students who may have just a small amount of experience, writing a blog or writing a few pieces, gives them the experience to be in a newsroom, interact with editors and write full time. So while it was a Slate, which was an amazing experience, and anyone who's interested in getting started in science communication in general, I'd highly recommend it. But one of the experiences I had there that was somewhat frustrating is that I actually, myself tried to write a data journalism piece while I was there, it was a really hot summer. And there were a lot of pieces about the heat waves. And I was interested in writing a piece about how, you know if global warming increases the average temperature, global temperature just slightly, how does that influence the probability of these really hot days, and I made some sort of dancing plots and nice visualizations, or at least nice visualizations that I thought, and at the end of the day, my editor just told me that she didn't think that it was possible to publish the story, because Slate really didn't have the capability of fact checking this kind of work. And it wasn't something that was really in their wheelhouse. So that was a bit of a frustrating experience to me, but I understood where she was coming from. So fast forward a few years now I'm at Rand, and I started noticing a lot more pretty sophisticated data journalism coming out of really large mass media outlets. And it started to get me thinking, you know, is this responsible way to be reporting to have these analyses that are completely driving the argument of mass media out with stories, but have the analyses done entirely in house? And that's basically where my starting point was?

Bailer
You know, what, a couple of things that you said in that article really jumped out when I read it, one was that news now speaks your favorite language, and that the general public is developing a healthy appetite for data too. So that was one, one of the early quotes and the other a fine line has been crossed that news outlets have stepped onto the field. They're doing the science themselves. So I thought that was an interesting framing of this. But a question for me was how have newsrooms changed? I mean, because you're, you know, when you're talking, you're talking about this. I mean, we've, we've talked to people in the past, and they've said, well, news, newsrooms often had people that had a very different kind of more historical political science perspective. And now you're talking about, you know, this community, embracing people that are thinking with data and, and telling stories with data. So do you believe that in fact, you've seen this change of mix?

Cabreros
Um, I probably shouldn't speak to all news outlets, um, but my impression, or at least my experience at Slate was that they, they didn't have the personnel or the expertise to vet these data journalism pieces, and so they made the decision not to sure whether or not other outlets have this capability is kind of unclear to me. I mean, I've had had some conversations with people that suggest that well, the in staff personalities, major news outlets may not have no PhDs in the scientific field or something like that, they often do reach out to professors or experts in the field, when doing a data journalism piece to have them double check what they're doing, have them give their rubber stamp, oftentimes, they'll sometimes they'll be included in the article as having contributed in some ways. And sometimes they're completely not included at all, though, they'll be asked for their expertise. And then there, there's actually no recognition of the work that they've done. And so that leaves it completely ambiguous to the reader, you know, how much of the work that I'm seeing is, has had a second pair of eyes on it.

Bailer
That makes me want to ask Brian the same question.

Tarran
Haha, well, I was gonna turn it on to rnao and say, you know, as somebody who has spent some time in a newsroom, and I don't really consider them dangerous places unless you count, you know, unhealthy eating and gaining weight, or risk, which is you describe the newsroom as striking you as a particularly dangerous place to produce science. So you know what, why, why do you think that?

Cabreros
Well, I think the biggest reason is just the size of the platform. Really, any scientific journal, even the major ones, have nothing like the platform of the New York Times or Washington Post or something like this. So the first thing is just the size of the platform. The second thing is the speed at which they produce material. I think probably the number one way that errors in science get caught is not necessarily through peer review. But just because of the extremely slow pace of scientific production, you really have a lot of time to catch your mistakes, do all sorts of quality control, when you're working at the pace of journal editors, science journal editors, when you're working at the pace of a newsroom editor is completely different stories that are interesting today might be completely irrelevant and a few days and there's a lot more pressure to produce quickly. And then after it's been produced, it's disseminated to an enormous audience, which is completely different from scientific research.

Tarran
Because you mentioned that actually, in the article, you're talking about writing nine articles in 10 weeks, was it compared to two articles in several years, you know, in your academic career, so definitely underlines the difference.

Cabreros
Yeah, during my PhD, it took me a really long time to write just a couple of articles. But this nine articles in 10 weeks is extremely slow, I was by far the slowest person around me in terms of output in Slate. So yeah, the pressures are just entirely different. And the timeline is completely different as well.

Bailer
Well, it's also a little bit of an apples and oranges comparison, you know, that you have, you know, when you're doing the science that you were doing was off, you may have been collecting the studies and collecting the data and conducting the experiments and doing all of the preliminaries as well as the literature review and setting the context. Whereas for some of the stories, you're you're, you're basically dealing, probably not doing a lot of analysis other than consuming, processing and packaging. So I mean, yeah, that seems like a very different, it's a very different workflow. So it's no surprise that it's a slower workflow, although still producing a written piece. You know, every, you know, multiple written pieces of week that have some reflect some understanding of a scientific assertion is no mean feat. I mean, that's something to celebrate. So I'm curious about how did you pick some of these topics that you that that you covered with, with slight, um,

Cabreros
I was, I was really given a lot of liberty to choose what topics I wrote about at Slate, my editor really gave me a lot of leeway in that respect, is really great experience. There were also a few pieces where I was just assigned to write a piece. I don't know one of the memorable ones, where I was just assigned to write a piece in a short period of time, was about this meme at the time. I don't know if that's the right word, actually. But where people were taking a sheet, putting it over their, their face, and then dropping it and watching the reaction of their dog who was watching. So the question was, is this somehow psychologically damaging to hear like that? Yeah, I can't say I learned too much through that through writing that piece, except that a lot of the veterinarians who I was calling we're really not that amused by my question. But yeah, and then the other. The other thing that sort of drove some of the stories I wrote was, I'm a Muslim and it was The summer it was during Ramadan, and a lot of the things I was thinking about were were in that vein. So I wrote a piece about some of the science of fasting. And also a piece about this enormous donation of meat that happens at the end of at the end of Ramadan. So yeah, it was just really driven by a kind of personal interest. With a couple of assigned stories thrown in there.

Pennington
I wonder, given that you have experience science writing in a newsroom. And you obviously have experienced writing as a researcher, you know, what thoughts you have about what might make the fact checking process for journalism, around science communication, work in a way that the data can be checked, but also within that, that very pressing time situation that journalists work against, right? Sometimes they're working against a deadline, that's an hour away, and sometimes it might be a week away. And I wonder if you have thoughts about what might be done to sort of help facilitate the fact checking when these kinds of stories are being produced?

Cabreros
Yeah, I mean, it's extremely hard fact checking tests and all our data analyses are just inherently really difficult things to do. And I think it's a bit unrealistic to expect that on a journalistic pace, something can be fact checked, a large amount of code can be fact checked, and to the point where everybody's completely satisfied with it before publishing. But what I do think is important is that even if that can't be done, and of course, like, the first line of defense is to do your best and have as much back checking as you can before going to print is having everything that you do be completely reproducible. So I think that's probably the biggest thing. It's understandable, it's difficult to do data journalism, to do data analyses at a journalistic pace. But if all of your work can be accessed by anybody who's interested to go in and poke at it, and see if everything's done correctly. If that's really available, then at least, errors can be caught very quickly.

Pennington
You're listening to Stats and Stories at the Royal Statistical Society's annual meeting. Our guest today is Rand corporations Irineo Cabreros, in the article for underdark, you suggest some things journalists should keep in mind as they're doing this data journalism. And I wonder if you could talk us through some of your ideas for how to help ensure news organizations that do this kind of data analytic work in their reporting, are engaging in best practices.

Cabreros
So I think the first thing I sort of already mentioned is that it would be great if all data journalistic work was really well documented, really available. After the fact, another thing that I think would be important, and is something that's already done in a lot of cases is involving statisticians and scientists to look over the code, look over the models that are being used outside of the newsroom. And a slight twist on that that I would recommend is something that could potentially be useful is to always ensure that those scientists that you're that they're using to check this code and to verify your results are always given ample recognition. One easy way to do that is just to have them included underneath the byline, with contributions by xX xX, I think that would serve a couple of really important functions. First of all, scientists I found while working at Slate, are generally really, really excited to talk to journalists, it's a kind of rare experience for them, where they get a lot of exposure. And giving them this incentive that yes, you will be recognized in your, in your input to this story will be plenty of motivation to have people help out journalists, and in these days digital stories. But the second thing that's also important is keeping people whose task is to check these data analyses and these outside experts accountable for the work. So if they have their name underneath the byline, and they're recognized for doing this, they're on the hook as well, if later, somebody finds that this is a mistake, so it gives them a little bit more skin in the game, so to speak. Um, another thing that I didn't mention in the piece, but that I sort of thought of later, was having a really standard way of recognizing the errors that you do make. I think it's inevitable that errors are going to be made. But they need to be really clear when that happens. So for instance, there was a piece that was published maybe a month ago in the New York Times about why we're not going to be able to reach herd immunity in the US. And one of the plots that was shown in that piece was a heat map of the United States with vaccine hesitancy rates in each of the counties or into each of the states. I can't remember, but that's not even a very sophisticated task. It's really just a summary of data. But there was a significant error that was made, which the New York Times recognized after people on Twitter pointed it out, which really changed the numbers that appeared in the plot. But if you were to open the story, and even if you had seen it before and seen after you really might not notice that anything had changed, the the sort of color scale on the on the plot was readjusted such that the figure itself was actually completely identical to the original figure, but just the scale had changed. And the recognition of the air that was made happen in this one liner, the very, very bottom, this piece was, by the way, picked up as the basis of a column by David Brooks. And there's no obvious recognition that there was a substantial change to the article after the fact. And that's not to sort of put blame on whoever was doing this analysis. I mean, like, all I do every day is statistics. And I make tons and tons and tons of mistakes all the time. But if that process of me making many mistakes, until the final project was kind of cut short, and published in between, I'm sure there'll be all sorts of mistakes of my own, that are reaching print as well. But yeah, so that's, that's just a third potential safeguard. One, having everything being completely available and reproducible, to having scientists who contribute be recognized, and three, having a standardized way of recognizing in a very obvious manner, any errors that are made.

Tarran
I think that's an important point. Because in the UK, certainly, I don't know so much about the US situation, there has been a history of, you know, big front page splashes being wrong and being corrected in a couple of paragraphs hidden on, you know, page 20, or wherever it is. So obviously, owning your mistakes. And being upfront with readers is very important. I was struck by your suggestion of the need for transparency, because we've published a few articles in significance that are, you know, written by data scientists, by statisticians calling for greater transparency from other scientists, and when they publish their work. So, you know, obviously, you know, this is an important point, but how do we encourage, you know, both science and journalism, which, as you highlight in your article, do you have a kind of shared interest to embrace this need for transparency?

Cabreros
Yeah, that's a completely fair point, a lot of the things that I'm saying are not unique to data, journalists, any any scientific piece that is published in a regular scientific journal, could also be improved by being completely reproducible. And it's slightly less dangerous. In the case of regular scientific research, that we have this requirement of complete reproducibility, I think it would still be great. And every, every article would be improved by having complete reproducibility. But I think it's because of the difference in time scale, and how quickly these things are being produced, there's slightly less risk from having these errors in conventional data in a conventional science outlet. And there's also the additional safeguard of having a rigorous peer review process. But that being said, Yeah, I think everybody as much as possible, aside from sort of data, privacy concerns that might keep you from having data be reproducible, and your analyses be reproducible, I think it's completely true, every analysis, whether journalistic, or conventional scientific analysis would would benefit from reproducibility, it's adds a lot of work to the scientists to do that. And because of that, and work, it also adds time. So that's another thing that competes with journalistic, the journalistic output by but I think it's completely worth it and completely unnecessary.

Bailer
I do like your, some of these ideas that you're, you're forwarding, I mean, I think that this idea of the recognition of the contribution is really important. Because, you know, a lot of times if people are engaged in this, they may be just doing it out of their hide out of a sense of sort of being good citizens or contributing. But it's good to be also not just a good citizen, but also have some, you know, like you say, some stake in this that, that you have some ownership in this process and some recognition for the contribution, you know, is that many employers might say, Well, what are you doing? Well, I'm helping this journalist out, well, show me how, you know, right. So it's, it's, I think it's nice to have that type of documentation, the idea of openness and reproducibility and the transparency that you've been promoting. It seems like there's a real change and a shift in culture that I think that seems like it's embraced now in ways that I don't remember ever seeing at this degree. So I think that it's certainly falling on much friendlier ears than before. So I mean, I think that you're you're you're raising a call that that I I think you're gonna find others will echo. I'd like to just, if I could change gears just a wee bit and and talk about your you know, that some of your writing in Slate. And I am going to say that I think that, you know, I've really enjoyed some of the terms of phrases that you have in your writing. And it seems like you've really, you know, you've honed your skills to write for this more general audience. I mean, you know, even in the UK, you certainly I, when you said, you know, like, for example, tipping the scales favorably on Judgment versus tipping a different type of scale, you know, sort of this, this fasting piece that you have. And I saw, I was just curious, how did you develop this skill? And, you know, and ultimately, how did you hone this skill to be able to use this to express to a more general audience, in a rather tight formatted version? I mean, you just have a couple of pages to convey this result?

Cabreros
Well, I think the writing skills specifically, I think, is just something that I kind of consciously worked at, while at Slate. And that was probably one of the reasons why my output was so comparatively small as I took a lot of time writing these pieces. In terms of sort of other, less writing specific things that I think have influenced the way that I convey ideas to like a more general audience comes from just my teaching background. I think that's really where it all started is, you know, my interest in science, communications to communication really all started with just interest in teaching. And yeah, as an undergraduate, I kind of got addicted to tang ca in courses. And that sort of led to other teaching experiences I, I taught middle school kids in Namibia, one summer I, in graduate school, I taught prisoners in New Jersey, and just getting this experience talking to people from very different backgrounds and trying to convey something to them, really, I think influenced the way that that I write, I think I placed a lot of important on analogies. I think that's one thing that I learned from teaching, it's just you can get a lot of mileage out of a really strong analogy, when you're trying to explain something pretty complicated. And also just examples, I think, having very clear examples is really the only way whether in teaching or in writing that you can hope to convey like a really complex topic.

Pennington
Well, that's all the time we have for this episode of Stats and Stories. Irineo, thank you so much for joining us today.

Cabreros
Thanks for having me.

Pennington
And I'd like to thank the Royal Statistical Society for hosting us again this year. Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.