Skip to content

infoculture/awesome-datajournalism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Data Journalism Awesome

An open source, open data and just open Data Journalism repository to learn and understand practical data journalism.

Table of contents


What is Data Journalism?

This part is for humans who are new to Data Journalism.

Data journalism is the practice of using data to find, create, and tell news stories—through the systematic collection, analysis, and visualization of structured information to inform the public. Unlike conventional reporting that relies primarily on interviews and observation, data journalism integrates statistical reasoning, programming, and design into the storytelling process.

The practice rests on three interconnected pillars:

Pillar Focus
Data acquisition Finding and extracting relevant datasets from diverse sources
Data processing Cleaning, transforming, and analyzing information
Data presentation Visualizing and contextualizing findings for audiences

Handbooks & guides

Foundational handbooks

Specialized manuals

Investigative methods


Education & learning

Formal education

North America

Europe

Russia & Asia

MOOCs and online learning


Data sources

Government and international open data

Specialized databases and APIs


Data collection tools

Guides and tutorials

Browser-based and no-code scrapers

Programming libraries

PDF and document parsers


Data cleaning

  • OpenRefine — Dedicated data cleaning; faceted browsing, clustering, GREL, reconciliation
  • CSV Lint — Validate CSV against standards
  • GoodTables — Data quality validation, type inference, range checking
  • Spreadsheet tools: Microsoft Excel (Power Query, pivot tables), Google Sheets (collaboration, API), LibreOffice Calc, Trifacta Wrangler — Cloud-based transformation suggestions

Data analysis

Python ecosystem

R ecosystem

  • tidyverse — dplyr, tidyr, readr, purrr
  • ggplot2 — Grammar of Graphics visualization
  • shiny — Interactive web apps
  • sf — Spatial data
  • tidytext — Text mining

Notebooks and interactive computing


Data visualization

Online chart and graph builders

Timelines

Code-based visualization libraries


Interactive storytelling

Scrollytelling and narrative platforms

  • Scrollama — JavaScript library for scroll-driven narratives
  • Idyll — Reactive markup for narrative development
  • Shorthand — Hosted platform for longform and team collaboration
  • ArcGIS StoryMaps — Map-centric narratives

Immersive and 3D

  • Three.js — WebGL-based 3D for globes and scenes
  • A-Frame — WebXR/VR with HTML-like markup

Audio and video

  • Whisper (OpenAI) — Transcription, multilingual, local deployment
  • Descript — Text-based audio/video editing, Overdub, collaboration
  • Remotion — Programmatic video with React

Annotation and diagramming

  • Excalidraw — Hand-drawn style diagrams, collaborative
  • Figma — Design systems, newsroom workflows
  • Miro / Mural — Collaborative whiteboards
  • FigJam — Figma-integrated whiteboarding

Fact-checking & verification

Image and media verification

Claim and source verification

Data integrity and provenance


Newsrooms & publications

Data journalism desks and outlets

  • The New York Times — The Upshot, Graphics Desk, R&D Lab
  • The Guardian — Datablog, Visuals team
  • ProPublica — Data and Research; open methodology
  • NPR Visuals — Audio-centric innovation, accessible design
  • Vox — Storytelling Studio, explainers
  • The Pudding — Visual essays, experimental formats

Industry and academic publications

Research centers and institutes

  • Tow Center for Digital Journalism (Columbia), Knight Lab (Northwestern), Stanford Computational Journalism Lab
  • OpenNews — SRCCON, fellowships, Source
  • GIJN — Global Investigative Journalism Network
  • ICIJ — International Consortium of Investigative Journalists

Community & professional networks

Social media and hashtags

Discussion platforms and associations

Conferences and events


Related resources

Other awesome lists

Curated datasets and tool directories


Quick reference: Data journalism workflow

Step Question Key resources
Learn What skills do I need? Data Journalism Handbook, Doing Journalism with Data, NICAR training
Find Where is the data? Government portals, FOIA, Data sources
Clean Is the data reliable? OpenRefine, spreadsheets, validation
Analyze What patterns emerge? Python/pandas, R/tidyverse, Jupyter, statistics
Visualize How do I show findings? Datawrapper, Flourish, D3.js
Publish How do I tell the story? Scrollytelling, interactive dashboards, Interactive storytelling
Verify Can others trust this? Documentation, data publication, methodology transparency, Fact-checking

For key sources used in this list, see e.g. Data Journalism Handbook, Media Helping Media, MIT KSJ Data Journalism Tools.

Releases

No releases published

Packages

 
 
 

Contributors