Model 3: Concentration in Data & Computation
A data journalism concentration should begin with several core, required classes before moving into a track of electives offering data journalism analysis, visualization, and online research/backgrounding.
The curriculum detailed below should provide a framework for a school to begin offering specialized coursework to students who wish to concentrate in data-driven reporting or computational journalism.
This section describes some of the courses that may form such a degree. Depending on the availability of instructors and other resources, classes like these may form either the mandatory core of a concentration in data and computation, or else a range of electives.
Please note that we would not expect any journalism school to offer all of these classes, nor only these, in its data and computational curriculum. This is just one picture of the skills and thematic exposure that could constitute a journalism degree specializing in data and computation.
Core Classes Required for Concentration in Data & Computation
Foundations of Data Journalism
This is the course outlined in the opening of this chapter (see full description on page 50) as a requirement for all journalism students. If students enter journalism school without declared concentrations, this introduction will be suitable for future data concentrators to learn the basics before proceeding to other required courses and electives. Schools may also choose to require applicants to be specifically accepted into the data concentration, in which case it may be advisable to offer a summer boot camp (see “Note on Incoming Skills, Technical Literacies, and Specialized Boot Camps,” page 74) to get students up to speed on the tools and methods they will need. In this case, data concentrators may be placed in a more advanced fall foundations course with their peers.
Introduction to Journalistic Programming
Course description: The purpose of this course is to introduce students to several foundational computer-programming skills that they will use to find and tell stories. This should be a requirement of those who concentrate in data and computation, but also open to students from other tracks.
Course structure: Meets twice weekly, first for lecture and then for an intensive workshop.
- Test proficiency with the command line with a quiz, or even a screencast demonstrating completion of a series of tasks using Bash alone.
- Story assignment reported and submitted in Jupyter/IPython notebook.
Statistics for Journalism
Course description: The methods and principles of statistics have proven to be powerful tools in the hands of journalists. This course should be a rigorous introduction to stats work taught from within a framework of journalistic concerns. That means the course is story-based, in the sense of precision journalism and the CAR tradition.
Course structure: Weekly lectures with in-class exercises, regular homework, and a final exam.
Skills: Developing and testing hypotheses; understanding and applying the central limit theorem, normal distribution, and confidence intervals; Frequentist versus Bayesian statistics; linear regression; analysis of variance.
Tools: R Studio, Excel, MySQL, Microsoft Access, SAS, SPSS (proprietary) or PSPP (F/OSS).
- Analyze crime statistics, look for a trend, and try to explain its cause.
- Look at the distribution of cancer cases and try to decide if there is evidence of an increase in more polluted areas.
- Analyze statistical evidence for U.S. and international cases to predict whether reducing the number of guns would have an effect on gun violence.
- Analyze the stats in a research paper and report them in plain language.
Distribution of Electives
For the concentration, the school may offer elective courses to fulfill requirements in two or three areas of data and computational work. We have divided these into three categories: presentation/visualization, analysis for story, and journalistic programming. As a matter of designing degree requirements, a program might choose to require at least one class from each category in addition to fulfilling overall credit requirements.
presentation & visualization
- Data Visualization
- Visual Journalism with Data and Computation
- Advanced Data Visualization
- Advanced Journalistic Mapping
analysis for story
- Writing About Data
- Statistical Analysis for Journalism
- Advanced Computational Reporting Methods (Using CAR)
- Introduction to Journalistic Programming
- Methods of Collecting Data and Automating Reporting
- News App Development
- Advanced Computational Journalism
Elective Coursework Graduate Degree with Concentration in Data & Computation
Methods of Collecting Data & Automating Reporting
Course description: This course focuses on developing expertise in gathering data, cleaning it, storing it in a database, and retrieving it with ease. It also emphasizes building automated tools to serve as data sources in reporting.
Course structure: Weekly workshop or lab-based instruction.
Skills: Web scraping, APIs, cron jobs, bash scripting, digitizing paper documents, regular expressions, parsing text and data, fuzzy string matching, record linkage, content analysis.
Tools: Python, Beautiful Soup, Mechanize, Scrapy, Tabula, SQL, MongoDB, data formats (CSV, JSON), Tesseract (OCR), Twitter bots.
- Classwork: Design Google Alerts to monitor subjects of interest.
- homework: Write a program to scrape the Congressional Record for everything a particular representative has said on the floor of the House.
- Homework: Write a web scraper in Python and automate it with a cron job.
- Homework: Build a web app or Twitter bot to post useful information from an API.
- Group project: Build a sensor network to automatically post temperature or air quality measurements online.
- Final project: Gather a useful body of data, previously unavailable, and share it publicly.
Visual Journalism with Data and Computation
Course description: This course covers a range of methods, media, and formats for the graphic presentation of information. Readings should introduce principles of visual design and integrate these into regular assignments. Beginning with a fairly basic program like Tableau, the class should highlight the effective and accurate presentation of information in graphic form. By the middle of the term, students should branch out into using a programming library such as D3 to design their own graphics outside the constraints of existing software.
Course structure: Weekly seminar to discuss readings, followed by hands-on workshop.
Skills: Data visualization, news apps, GIS/mapping for presentation.
- Homework: Use Tableau to find a story in a previously unexplored data set.
Advanced Data Analysis & Journalistic Algorithms
Course description: This course should build upon the core, required classes to bring together data and computation for finding stories and making predictions using algorithmic and computational analysis.
Course structure: Weekly lecture and workshop with regular homework and a final project.
Skills: Python for machine learning, clustering, classifying documents, standardizing and matching algorithms.
Tools: R, Python (Pandas, MatPlotLib, SciPy, scikit-learn), clustering algorithms (k-means, k-nearest neighbor clustering), topic modeling algorithms (LDA or NMF).
- Classwork: Record linkage for data cleaning, for example, analyze Federal Election Commission data to find top donors, which requires regularization of names, best done with machine learning.
- Homework: Analyze State of the Union speeches since 1790 to make a visualization of how key topics have changed over time.
- homework: Implement clustering to detect outliers in a data set.
- Final project option: Build an election or market prediction model.
- Final project option: Reverse engineer a pricing, lending, or credit score algorithm.
Advanced Data Visualization
Course structure: It may be designed to alternate between seminars (high-level reading, discussion, and analysis of visual communication and information design principles, focusing on how it is most effective and where it can be misleading) and lab classes (advanced practical instruction in application and coding frameworks for info design).
Skills: Designing for clarity, precision, impact.
- Homework: Regular data assignments in different media: static web, video, interactive.
- Final project: An original analysis of unexplored data, presented in an original visualization programmed more or less from scratch, with cross-platform consistency.
Advanced Journalistic Mapping
Course description: This course should build on previous coursework in mapping to cover more advanced manipulations of data, to develop a higher degree of design sophistication, and to develop a high level of news judgment in the selection of timely, compelling, and original topics. This involves using GIS technologies, joining that spatial data with other information, using density and other spatial analysis to inform stories, not just building presentations.
Course structure: Hands-on workshop and lab.
Skills: Clustering, binning, heat maps, joining different geographic data sets.
Tools: Both GIS analysis software and presentation software, including Esri, QGIS, CartoDB, Leaflet, sensors like DustDuino.
- Homework: Weekly pitches and journalistic mapping assignments.
- Final project: An interactive map or set of maps telling a story about a timely or unexplored subject, and/or a narrative story using findings from the mapping analysis.
- Class project: Build a sensor network or otherwise amass an unexplored data set, then work in small groups to build a package of maps to explore the data.
Advanced Journalistic Text Mining
Course description: Text is data. The purpose of this class is to teach journalists to gather, analyze, and present stories using large amounts of textual data. This may build on material from the course “Methods of Collecting Data and Automating Reporting.”
Course structure: Weekly lecture and lab, working toward a final project.
Skills: Web scraping, analyzing large bodies of text, sentiment analysis, topic modeling.
Tools: Overview, DocumentCloud, Natural Language Toolkit (NLTK) or Stanford NLP.
- Homework: Use sentiment analysis to reproduce the before and after tone change of a story.
- Final project: Build a scraper to crawl a significant chunk of the Web, for example, collecting the blogosphere of a country that’s in the news and learning what people are talking about.
- Final project: Gather and analyze a large body of documents, such as looking for a story in a leaked cache of documents.
Advanced Computational Journalism
Course description: This course should reflect the state of computational tools in journalistic practice while looking toward novel applications of emerging and unexplored tools. By this time, students should have already developed a strong foundation of programming and data analysis skills. This class should build on that foundation and encourage in-depth, independent projects centered on reporting stories or developing a piece of software.
Course structure: Meets twice weekly, once for lecture and once for lab.
Tools: Python, Ruby, or a similarly powerful and versatile scripting language. Additionally, physical computing tools like Arduino.
- Homework: Use regular expressions to mine the Congressional Record for a senator’s stated positions on a political issue over the course of his or her career.
- Mock coding interview: In the common style of interviewing for tech jobs, solve a given programming problem on a whiteboard and narrate your line of thinking.
- Final project: An in-depth story reported using an advanced tool, including but not limited to a journalistic algorithm, machine learning, analysis of personally gathered data, or development of a piece of software.
Capstone or Thesis Project
A thesis in data journalism will arise, ideally, from work with instructors and an adviser. It could take the form of a reported story, a technical report, a piece of software, or a substantial design piece such as a map or data visualization.
A capstone project for concentrators in data and computation may take a class of students and coordinate a project using and honing the skills they have developed in their earlier coursework. Each student’s work should then be supplemented with an individual contribution such as a reported piece or data visualization.