An open source, open data and just open Data Journalism repository to learn and understand practical data journalism.
- What is Data Journalism?
- Handbooks & guides
- Education & learning
- Data sources
- Data collection tools
- Data cleaning
- Data analysis
- Data visualization
- Interactive storytelling
- Fact-checking & verification
- Newsrooms & publications
- Community & professional networks
- Related resources
- Quick reference: Data journalism workflow
This part is for humans who are new to Data Journalism.
Data journalism is the practice of using data to find, create, and tell news stories—through the systematic collection, analysis, and visualization of structured information to inform the public. Unlike conventional reporting that relies primarily on interviews and observation, data journalism integrates statistical reasoning, programming, and design into the storytelling process.
The practice rests on three interconnected pillars:
| Pillar | Focus |
|---|---|
| Data acquisition | Finding and extracting relevant datasets from diverse sources |
| Data processing | Cleaning, transforming, and analyzing information |
| Data presentation | Visualizing and contextualizing findings for audiences |
- What is Data Journalism? — Data Journalism Handbook
- Wikipedia: Data-driven journalism
- The Data Journalism Handbook — Open-access guide with global case studies
- The Investigative Reporter's Handbook — Data methods with traditional investigative techniques
- CIJ Data Journalism Book — Centre for Investigative Journalism
- The Functional Art — Alberto Cairo's foundational visualization theory
- Data + Design — Introduction to data and design
- Finding Stories in Spreadsheets — Paul Bradshaw's guide to spreadsheet analysis
- Scraping for Journalists — Web extraction techniques with legal guidance
- Data Journalism Heist — Leanpub
- How Charts Lie — Critical guide to visualization pitfalls
- Knowledge is Beautiful
- The Information Capital
- Organising an Online Investigation Team
- GIJN Guide to Data Journalism — Global investigative journalism resources
- Verification Handbook — Data verification protocols and authentication
- Facts are Sacred — The Guardian Datablog
North America
- Specialization in data @ Columbia Journalism School
- Stanford Journalism Program — Data journalism and storytelling
- Computational Journalism @ Georgia Tech — Technical depth with computer science
Europe
- Data journalism @ City, University of London
- Data Journalism MA @ Tilburg University
- Data journalism @ Sciences Po Paris — Political/economic reporting
Russia & Asia
- Data journalism, magister program @ HSE — High School of Economics, Russia
- Doing Journalism with Data — European Journalism Centre MOOC (datajournalism.com)
- Python for Journalists — Programming for data cleaning, analysis, and visualization
- Learno.net data courses
- Codecademy — Web API courses
- World Bank Open Data — 200+ countries, 1,400+ indicators
- UN Data — Multiple UN agencies, aggregated portal
- HDX — Humanitarian data, real-time updates
- US Government open data — data.gov
- UK Government open data — data.gov.uk
- Data.europa.eu — EU institutional and open data
- OpenCorporates — Company registries across 140+ jurisdictions
- OpenSecrets — U.S. campaign finance and lobbying
- NASA Earthdata — Satellite imagery and climate variables
- Global Forest Watch — Near-real-time forest change data
- IPUMS — Harmonized census microdata across countries
- Pew Research Center — Download datasets — Public opinion and social trends
- Epstein Exposed — Searchable database of Jeffrey Epstein DOJ case files (full-text search, network graph, REST API)
- Scraping for Journalism: A Guide for Collecting Data — ProPublica
- Making data on the web useful: scraping — School of Data
- HTML Scraping Python Guide with lxml
- Beginner's guide to Web Scraping in Python using BeautifulSoup
- A Guide to Web Scraping Tools
- Web Scraper — Chrome extension, point-and-click, pagination, CSV/JSON export
- Data Miner — Chrome extension and cloud, recipes, scheduling
- ParseHub — Turn dynamic websites into APIs
- Diggernaut — Turn website content into datasets
- Chrome Scraper extension — Simple browser scraper
- Python: Beautiful Soup + Requests, Scrapy, Selenium, Playwright
- R: rvest, RSelenium
- Tabula — Extract tables from PDFs
- Camelot — Python PDF table extraction
- Amazon Textract — ML-based OCR and form recognition
- OpenRefine — Dedicated data cleaning; faceted browsing, clustering, GREL, reconciliation
- CSV Lint — Validate CSV against standards
- GoodTables — Data quality validation, type inference, range checking
- Spreadsheet tools: Microsoft Excel (Power Query, pivot tables), Google Sheets (collaboration, API), LibreOffice Calc, Trifacta Wrangler — Cloud-based transformation suggestions
- pandas — Data manipulation with DataFrames
- NumPy — Numerical computing
- matplotlib / seaborn — Statistical visualization
- scikit-learn — Machine learning
- statsmodels — Statistical modeling and hypothesis testing
- NLTK / spaCy — Natural language processing
- tidyverse — dplyr, tidyr, readr, purrr
- ggplot2 — Grammar of Graphics visualization
- shiny — Interactive web apps
- sf — Spatial data
- tidytext — Text mining
- Jupyter / JupyterLab
- Quarto — Multi-format publishing (documents, presentations, sites)
- Google Colab — Free GPU/TPU, Google Drive
- Kaggle Notebooks — Competitions and datasets
- Observable — JavaScript-native reactive notebooks
- Posit (RStudio) — R environment with Quarto and Shiny
- Datawrapper — Accessibility, responsive design, journalistic defaults
- Flourish — Animation, storytelling, 3D, templates
- RAWGraphs — Complex chart types (alluvial, voronoi, sunburst), SVG export
- Tableau Public
- Google Looker Studio — Dashboards, 500+ connectors
- Canva, Piktochart, Venngage, Infogram
- Plotly, Charted, Data Illustrator
- Timeline JS — Knight Lab
- Preceden, Tiki-Toki, Hstry
- D3.js — Data-Driven Documents; maximum flexibility
- Vega-Lite — Declarative grammar
- Observable Plot — Grammar-based, concise
- Chart.js — Lightweight, responsive
- ECharts — Apache; performant, extensive options
- ggplot2 (R), Matplotlib (Python), Bokeh (Python)
- Highcharts, amCharts, r2d3
- RAW — RAWGraphs predecessor; export SVG
- Opendata-tools visualization list
- Chartmaker — comparison of data visualisation tools
- Periodic table of Visualization
- Scrollama — JavaScript library for scroll-driven narratives
- Idyll — Reactive markup for narrative development
- Shorthand — Hosted platform for longform and team collaboration
- ArcGIS StoryMaps — Map-centric narratives
- Whisper (OpenAI) — Transcription, multilingual, local deployment
- Descript — Text-based audio/video editing, Overdub, collaboration
- Remotion — Programmatic video with React
- Excalidraw — Hand-drawn style diagrams, collaborative
- Figma — Design systems, newsroom workflows
- Miro / Mural — Collaborative whiteboards
- FigJam — Figma-integrated whiteboarding
- Google Images, TinEye, Yandex Images — Reverse image search
- ExifTool — Metadata extraction (camera, GPS, editing)
- Forensically — Error level analysis, clone detection
- Microsoft Video Authenticator, Sensity — Deepfake detection (human judgment and source verification remain essential)
- Google Fact Check Tools — Aggregated fact-checks, API
- Duke Reporters' Lab — Fact-checking database
- Bellingcat Toolkit — Open-source investigation techniques
- Check (Meedan) — Collaborative verification, claim documentation
- DocumentCloud — Document upload, OCR, annotation, publication
- SecureDrop — Secure anonymous source communication
- OpenTimestamps / OriginStamp — Timestamping for tamper-evident documentation
- The New York Times — The Upshot, Graphics Desk, R&D Lab
- The Guardian — Datablog, Visuals team
- ProPublica — Data and Research; open methodology
- NPR Visuals — Audio-centric innovation, accessible design
- Vox — Storytelling Studio, explainers
- The Pudding — Visual essays, experimental formats
- Data Driven Journalism
- Source (OpenNews) — Newsroom technology practice
- Nieman Lab — Journalism innovation
- Columbia Journalism Review
- Digital Journalism, Journalism Practice — Peer-reviewed
- A short list of online articles and references on data journalism
- How to get started with GitHub for Dummies Journalists
- Journalism and New media — CARTO
- Tow Center for Digital Journalism (Columbia), Knight Lab (Northwestern), Stanford Computational Journalism Lab
- OpenNews — SRCCON, fellowships, Source
- GIJN — Global Investigative Journalism Network
- ICIJ — International Consortium of Investigative Journalists
- Twitter/X: #datajournalism, #ddj, #infovis — Guardian Data, Data Journalism Blog, Simon Rogers, Paul Bradshaw, Daten Journalist
- Facebook: Data Driven Journalism, Data journalism blog
- LinkedIn: Data Journalism and Investigative Journalists groups
- IRE — Investigative Reporters and Editors — NICAR conference, training, resource library
- SND — Society for News Design — Design and visualization awards
- ONA — Online News Association
- Data Visualization Society — datavisualizationsociety.com
- Hacks/Hackers — hackshackers.com — Journalist–technologist meetups
- News Nerdery (Slack), Data Visualization Society (Slack) — Invitation or membership-based
- Global Data Journalists Directory
- MaryJo Webster's training materials
- NICAR (IRE) — Premier data journalism conference, U.S.
- Dataharvest — European investigative journalism, Belgium
- International Journalism Festival — April, Perugia, Italy
- CIJ Summer School — July, London, UK
- Malofiej — Infographics and visualization, Spain
- Hacks/Hackers chapters — 80+ cities; DVS regional events; national journalism association tracks
- awesome-awesomeness
- awesome-datascience — Data science resources
- awesome-machine-learning
- awesome-dataviz
- awesome-d3 — D3.js resources
- awesome-python
- awesome-R
- awesome-public-datasets
- awesome-opendata-rus — Open data in Russian
- lists
- Data Science IPython Notebooks
- ProPublica Data Store
- FiveThirtyEight Data
- BuzzFeed News GitHub — Investigative data and replication
- Kaggle Datasets
- Google Dataset Search
- Dateno — Dataset search engine; 22+ million open datasets across 5,000+ catalogs worldwide
- Journalism Tools, Data Journalism Tools
- Source Guides — OpenNews
| Step | Question | Key resources |
|---|---|---|
| Learn | What skills do I need? | Data Journalism Handbook, Doing Journalism with Data, NICAR training |
| Find | Where is the data? | Government portals, FOIA, Data sources |
| Clean | Is the data reliable? | OpenRefine, spreadsheets, validation |
| Analyze | What patterns emerge? | Python/pandas, R/tidyverse, Jupyter, statistics |
| Visualize | How do I show findings? | Datawrapper, Flourish, D3.js |
| Publish | How do I tell the story? | Scrollytelling, interactive dashboards, Interactive storytelling |
| Verify | Can others trust this? | Documentation, data publication, methodology transparency, Fact-checking |
For key sources used in this list, see e.g. Data Journalism Handbook, Media Helping Media, MIT KSJ Data Journalism Tools.