Appendix

Tables from our analysis

Classes Offered by Subject at ACEJMC-Accredited Journalism Programs

Data Journalism

Number of Classes Number of Programs Percent of Total
No class 54 48%
One class 27 24%
Two classes 14 12%
Three or more classes 18 16%

Classes with Data Journalism as a Component

Number of Classes Number of Programs Percent of Total
No Classes 44 38%
One Class 31 27%
Two Classes 22 19%
Three Classes 9 8%
Four or More Classes 7 6%

Multimedia

Number of Classes Number of Programs Percent of total
No classes 20 18%
One class 31 27%
Two classes 12 11%
Three classes 16 14%
Four or more classes 34 30%

Programming Beyond HTML/CSS

Number of Classes Number of Programs Percent of Total
No Classes 99 88%
One Class 6 5%
Two Classes 5 4%
Three or More Classes 3 3%

Note: This analysis of programming classes is focused on those courses taught within a journalism program. It should be noted that a fair number of schools pointed to collaborations with other departments where journalism students were able to take advanced programming or computer science classes.

Notable Stories

Below we list several examples, for reference, of stories that are emblematic of the categories we define in Chapter 1.

Data Reporting

  • “Drugging Our Kids,” San Jose Mercury News, 2014
  • “Methadone and the Politics of Pain,“ The Seattle Times, 2012

Data Visualization and Interactives

  • ProPublica’s “Dollars for Docs,” 2010
  • The Washington Post‘s visualization of the missing Malaysian jet, 2014

Emerging Journalistic Technologies

Drone Examples:

  • “Tanzania: Initiative to Stop the Poaching of Elephants,” CCTV Africa, 2014
  • Because of regulatory issues with the Federal Aviation Administration, the use of drones for journalism is not widespread in spite of significant interest on the part of industry and academia. Uses foreseen when regulations become more permissible include news photography and videography, scanning news locations for use in 3D models and 360-degree video applications, remotely sensed data gathering through visible images or multispectral images, mapping of areas of interest at higher temporal resolutions than currently available and as sensor distributors or sensor-based data gatherers.

Sensor Examples:

  • WNYC’s Cicada Tracker project in 2013 recruited interested listeners to use sensors to identify where cicadas would emerge.
  • USA Today’s “Ghost Factories” investigation in 2012 used X-ray gun sensors to scan the soil.
  • The Houston Chronicle’s 2005 investigative story “In Harm’s Way” used sensors to examine air quality near oil refineries and factories.

Virtual and Augmented Reality Examples:

  • The New York Times sent out more than a million Google Cardboard kits to subscribers in 2015 as it launched its first VR story, “The Displaced,” a piece detailing children displaced by war.
  • Stanford University’s Department of Communication, home to the Stanford Virtual Human Interaction Lab, has scheduled a VR class for the winter 2016 quarter as part of its curriculum for its master’s in journalism program.

Computational Journalism

Story Examples:

  • The 2014 Wall Street Journal investigation into Medicare
  • “The Echo Chamber,” a 2014 Reuters investigation into influence at the Supreme Court

Platform Example:

  • PDF repository DocumentCloud or Overview, developed by Jonathan Stray

Tools, Resources and Methods Discussed in the Report

The ethics of software may also shape decisions about the tools and techniques you teach. “Free” software is licensed in an effort to promote freedom of computing, in a manner analogous to freedom of speech. Free software may be copied, altered, used, and shared freely. A related form of software licensing, titled “open source,” is very similar to free software, but instead emphasizes the public availability of code.

Proprietary software may also have certain advantages. Often the interface design is more polished, support services are provided, and in some cases they simply run better on demanding tasks.

But the gap between free and proprietary software has become narrower in recent years, and many professionals in fact prefer to use free and open source software on more than ideological grounds. F/OSS software is often more secure because it can be openly vetted by security researchers. For the same reason, particularly popular applications may have many talented and dedicated developers, as well as a support community of fellow users rather than call center or online service.

Given the expense of proprietary software and its inevitable obsolescence, there are few advantages to using these applications in data and computation classes instead of free and open-source ones.

Guide to Common Tools for Data and Computational Journalism

The following list of common tools for data and computational journalism is quoted from the Lede Program at Columbia.

Programming Languages

C is a heavy-lifting programming language that is the language of choice for the Computer Science Department. It’s far faster than Python or JavaScript and introduces you to the nitty-gritty of computer science.

Git is something called a version control system—it’s not a programming language, but programmers use it often. Version control is a way of keeping track of the history of your code, along with providing a structure that encourages collaboration. GitHub is a popular cloud-based service that makes use of git, and we make heavy use of it during the Lede Program.

HTML isn’t technically a programming language, it’s a markup language. A HyperText Markup Language, to be exact. HTML is used to explain what different parts of web pages are to your browser, and you use it extensively when learning to scrape web pages.

JavaScript is a programming language that’s in charge of interactivity on the Web. When images wiggle or pop-ups annoy you, that’s all JavaScript. The popular interactive data visualization framework D3 is built using JavaScript.

Python is a multipurpose programming language that is at home crunching, parsing text, or building Twitter bots. We use Python extensively in the Lede.

R is a programming language that is used widely for mathematical and statistical processing.

Tools for Data and Analysis

Beautiful Soup and lxml are tools used for taking data from the Web and making it accessible to your computer.

D3 is a JavaScript library for building custom data visualizations.

IPython Notebooks are an interactive programming environment that encourage documentation, transparency, and reproducibility of work. When you’re done with your analysis, you’ll be able to put your work up for everyone to see—and check!

NLTK (Natural Language Toolkit) is a Python library built to process large amounts of text. Whether you’re analyzing congressional bills, Twitter outrages, or Shakespearean plays, NLTK has you covered.

OpenRefine (previously Google Refine) is downloadable software that helps you sort and sift dirty data, cleaning it to the point where you can start your actual analysis.

Pandas is a high-performance data analysis tool for Python.

QGIS (geographic information system) is an open-source tool used to work with geographic data, from reprojecting and combining data sets to running analyses and making visualizations.

Scitkit-learn is a Python package for machine learning and data analysis. It’s the Swiss Army knife of data science: it covers classification, regression, clustering, dimensionality reduction, and so much more.

Web scraping is the process of taking information off of websites and making use of it on your computer. A lot of times documents aren’t easily available in accessible formats, and you need to scrape them in order to process and analyze them.

Data Formats

An API (application programming interface) is a way for computers to communicate to one another. For us, this generally means sharing data. We’ll be coding up Python scripts to talk to and request data from machines around the world, from Twitter to the U.S. government.

CSVs (comma-separated values) are the most common format for data. It’s a quick export away from Excel or Google Spreadsheets, and you’ll find yourself working from CSVs more often than any other format. Although “comma-separated” is in the name, a CSV can arguably also use tabs, pipes, or any other character as a field delimiter (although the tab-separated one can also be called a TSV).

GeoJSON and Topojson are specifically formatted JSON files that contain geographic data.

JSON stands for JavaScript Object Notation, and it’s a slightly more complicated format than a CSV. It can contain lists, numbers, strings, sub-items, and all sort of complexities that are great for expressing the nuance of real-world data. Data from an API is often formatted as JSON.

SQL (Structured Query Language) is a language to talk to databases. You’ll sometimes find data sets in SQL format, ready to be imported into your database system of choice.

Tech Team Report

Another useful resource for understanding the tools of data journalism was prepared at Stanford by an interdisciplinary team of computer science and data journalism students in a Spring 2015 course on watchdog reporting. The report is available here: http://cjlab.stanford.edu/tech-team-report/

Resources

Online courses and MOOCs

Useful Data Sets for Classwork and Assignments

  • John Tukey, Exploratory Data Analysis (Upper Saddle River, NJ: Pearson Education. 1977)
  • James A. Davis, The Logic of Causal Order (Thousand Oaks, CA: Sage, 1985)
  • Robert P. Abelson, Statistics as Principled Argument (Hillsdale, NJ: Lawrence Erlbaum Associates, 1995)

Data Journalism Articles, Projects, and Reading Lists Used in Instruction

MOOC Examples:

Lede Program Curriculum

The Lede Program at Columbia Journalism School is a post-baccalaureate in which students from a variety of backgrounds learn data and computation skills over the course of one or two semesters. The program was designed to help students rapidly elevate their skills in these areas, especially if they were considering applying for Columbia’s highly demanding dual-degree program in journalism and computer science.

In the context of this report, the one-semester version of the Lede represents a promising “extended boot camp” in which students who have been accepted into a data journalism master’s program may attend for a full summer before their peers in order to develop the skills that will help them get the most out of their education.

The following course descriptions were pulled on November 5, 2015, from: http://www.journalism.columbia.edu/page/1060-the-lede-program-courses/908

Foundations of Computing

During this introduction to the ins and outs of the Python programming language, students build a foundation upon which their later, more coding-intensive classes will depend. Dirty, real-world data sets will be cleaned, parsed and processed while recreating modern journalistic projects. The course will also touch upon basic visualization and mapping, and how to use public resources such as Google and Stack Overflow to build self-reliance.

Focus: Familiarize yourself with the data-driven landscape

Topics & tools include: Python, basic statistical analysis, OpenRefine, CartoDB, pandas, HTML, CSVs, algorithmic story generation, narrative workflow, csvkit, git/GitHub, Stack Overflow, data cleaning, command line tools, and more

Data and Databases

Students will become familiar with a variety of data formats and methods for storing, accessing and processing information. Topics covered include comma-separated documents, interaction with website APIs and JSON, raw-text document dumps, regular expressions, text mining, SQL databases, and more. Students will also tackle less accessible data by building web scrapers and converting difficult-to-use PDFs into useable information.

Focus: Finding and working with data

Topics & tools include: SQL, APIs, CSVs, regular expressions, text mining, PDF processing, pandas, Python, HTML, Beautiful Soup, IPython Notebooks, and more

Algorithms

Machine learning and data science are integral to processing and understanding large data sets. Whether you’re clustering schools or crime data, analyzing relationships between people or businesses, or searching for a single fact in a large data set, algorithms can help. Through supervised and unsupervised learning, students will generate leads, create insights, and figure out how to best focus their efforts with large data sets. A critical eye toward applications of algorithms will also be developed, uncovering the pitfalls and biases to look for in your own and others’ work.

Focus: Analyzing your data

Topics & tools include: linear regression, clustering, text mining, natural language processing, decision trees, machine learning, scikit-learn, Python, and more

Data Analysis Studio

In this project-driven course, students refine their creative workflow on personal work, from obtaining and cleaning data to final presentation. Data is explored not only as the basis for visualization, but also as a lead-generating foundation, requiring further investigative or research-oriented work. Regular critiques from instructors and visiting professionals are a critical piece of the course.

Focus: Applying your skillset

Topics & tools include: Tableau, web scraping, mapping, CartoDB, GIS/QGIS, data cleaning, documentation, and more

results matching ""

    No results matching ""