The Coding Issue
Whether data journalists need to program remains an active debate. But when we delved into this issue, we found that we first need to define what we mean in terms of data journalism. To some, “code” means web development and design—back to the concept of HTML and CSS. “Programming” means writing programs that enable advanced mining of data or algorithms that could identify patterns.
The bottom line is that to do more advanced data journalism, its practitioners need, at a minimum, to understand how programming works. This could be considered the start of computational thinking. Just scraping information from the Web can involve simple programming using Python, and understanding what is possible with programmatic solutions is critical for journalists looking at websites and other troves of information, much of which is not just in rows and columns.
As students develop the ability to recognize computational solutions to some of these problems, some of them may then learn how to program. But even those who don’t take the coding path should still be able to understand how solutions like these can be a part of their journalistic practice. The ability to work with data and think in terms of computation is a skill broader and more necessary than any specific tool or programming language. It is vital that we don’t confuse the two.
Mark Hansen, a professor of journalism at Columbia and director of its Brown Institute for Media Innovation, also focuses on teaching both programming skills and the mindset of computational journalism. The idea, according to Hansen, is that by using programming, journalists can think beyond rows and columns as they search for answers in data of all forms, whether structured or unstructured.
Nicholas Diakopoulos, a computer scientist at the University of Maryland, has been teaching a number of classes in data journalism beyond the introductory level. And he also provides a course on coding in the sophomore year. His aim, he said, is to move the undergraduate students from the track of learning CSS/HTML basic web skills to understanding web development and news apps. Beyond that, he’s offering a class on computational journalism with a focus on Python, text analysis and aggregation, recommender systems, and writing stories with code behind it.
Diakopoulos suggested that students could also take computer science classes if they want to learn how to be hackers—meaning, in the original sense of the word, anyone fluent enough with computers to use them creatively.
It’s all about working with data in a principled way, Diakopoulos said. He ties this to CAR and Philip Meyer’s crusade to bring the scientific way of thinking into journalism 50 years earlier; in other words, thinking methodically, thinking about how to frame an experiment, gather data, and use rigorous methods to build evidence of some finding of journalistic importance.
In the end, data journalism is about teaching how to find the story, using an increasing array of data techniques, said David Donald, data journalist in residence at the School of Communication at American University and data director of AU’s Investigative Reporting Workshop. “You’re still talking about story and how data needs to be vetted and be expressed in a way that gets into the public’s brain easily,” Donald said. “From the investigative side, you are looking for evidence in the data.”
Developing that computational ability will become even more important to handle the vast amounts of data in today’s world. More tools will come and go, but data journalism, at its core, will enable journalists to do their job in a more expansive way, said Coll of Columbia.
“I think that [data journalism] will be around for a while” Coll said. “It will be around for a whole set of iterations of platforms and distribution systems, and even media. So we get virtual reality, or we get 3D, or we don’t. That’s a whole different set of questions. This is going to be about how you report on government, how you report on corporations, how you tell wheat from chaff.”
Data journalists are starting to address this type of coverage, but it takes a deeper level of data journalism capability—the computational journalism slice of data journalism. Some of it involves presenting data in new, journalistic ways. The “Surgeon Scorecard,” published by ProPublica, is one example. ProPublica used extensive Medicare data and collaborated with leaders in the field to evaluate the performance of surgeons.
In other iterations, this level of computational journalism means examining information in new, more complex ways. Examples of that type of data journalism are the Wall Street Journal’s coverage of the Medicare system, which received the 2015 Pulitzer Prize in Investigative Reporting, and Reuters’ 2014 project examining influence in the Supreme Court.