Skip to content

feat(postcodes/DACH): bulk-import 35,596 codes via OpenPLZ (#1039)#1431

Merged
dr5hn merged 1 commit into
masterfrom
feat/postcodes-dach-bulk
Apr 27, 2026
Merged

feat(postcodes/DACH): bulk-import 35,596 codes via OpenPLZ (#1039)#1431
dr5hn merged 1 commit into
masterfrom
feat/postcodes-dach-bulk

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented Apr 27, 2026

Summary

Second bulk postcode PR after India (#1430). Adds:

  1. bin/scripts/sync/import_openplz_postcodes.py — pipeline that walks the OpenPLZ REST API hierarchy for Germany, Austria, and Switzerland, paginating Localities under each region. ODbL-1.0 licensed source — an exact match for this repo's licence.

  2. Three new contribution files:

    • contributions/postcodes/DE.json12,815 records, 2.5 MB
    • contributions/postcodes/AT.json18,722 records, 4.0 MB
    • contributions/postcodes/CH.json4,059 records, 840 KB

    Total: 35,596 postcodes across DACH with 100% state_id resolution.

State resolution strategy

Country Approach
DE Exact case-insensitive name match against states.json, with umlaut normalisation. 100% (16/16 federal states match cleanly).
CH Same, plus translation aliases: Luzern→Lucerne, Genève→Geneva, Basel-Landschaft→Basel-Land, slash-stripped multilingual names (Fribourg/Freiburg → Fribourg). 100% (26/26 cantons).
AT Name aliases for translations (Wien→Vienna, Tirol→Tyrol, Steiermark→Styria, Kärnten→Carinthia, Nieder-/Oberösterreich→Lower/Upper Austria). Postcode-prefix fallback handles the rest using Austrian Post's well-documented prefix scheme (1xxx=Wien, 8xxx=Steiermark, 9xxx=Kärnten, …) — more reliable than fuzzy multilingual matching. 100% (9/9 provinces).

LI deliberately omitted

contributions/postcodes/LI.json is already curated (#1401) with 13 high-quality 1:1 commune-to-code rows. OpenPLZ's per-code endpoint returns multiple sub-localities per code (e.g. Vaduz-Triesen, Vaduz-Schaan) that would muddy the existing clean mapping.

Validation (zero errors across 35,596 records)

Check DE AT CH
Records 12,815 18,722 4,059
state_id resolved 100% 100% 100%
Codes matching postal_code_regex
FKs resolve
state_codestate.iso2 agreement
No auto-managed fields

License & attribution

Cumulative postcode coverage after this lands

Country Codes Source
IN (#1430) 19,100 India Post
DE (this PR) 12,815 OpenPLZ
AT (this PR) 18,722 OpenPLZ
CH (this PR) 4,059 OpenPLZ
All earlier merged ~80 manual
Total ~54,800

Refs: #1039

Adds two things in one PR (the script + its first run):

1. bin/scripts/sync/import_openplz_postcodes.py — pipeline that walks
   the OpenPLZ REST API hierarchy for Germany, Austria, and
   Switzerland, paginating Localities under each region. ODbL-1.0
   licensed source matches this repo's licence exactly.

2. Three new contribution files:
     contributions/postcodes/DE.json (12,815 records, 2.5 MB)
     contributions/postcodes/AT.json (18,722 records, 4.0 MB)
     contributions/postcodes/CH.json ( 4,059 records, 840 KB)

   Total: 35,596 postcodes across DACH with 100% state_id resolution.

How state resolution works
- DE/CH: exact case-insensitive name match against states.json (with
  light umlaut normalisation and a few translation aliases:
  Luzern->Lucerne, Genève->Geneva, Basel-Landschaft->Basel-Land,
  Fribourg/Freiburg->Fribourg, etc.)
- AT: name match catches the directly-spelt provinces (Salzburg,
  Vorarlberg, Burgenland) and aliases handle translations
  (Wien->Vienna, Tirol->Tyrol, Steiermark->Styria, Kärnten->Carinthia,
  Niederösterreich->Lower Austria, Oberösterreich->Upper Austria).
  A postcode-prefix fallback covers the rest (1xxx=Wien, 8xxx=Steiermark,
  9xxx=Kärnten, etc.) — Austrian Post's well-documented prefix scheme
  is more reliable than fuzzy multilingual name matching.

LI deliberately omitted
- contributions/postcodes/LI.json is already curated (#1401) with 13
  high-quality 1:1 commune-to-code rows. OpenPLZ's per-code endpoint
  returns multiple sub-localities per code (e.g. Vaduz-Triesen,
  Vaduz-Schaan) that would muddy the existing clean mapping.

Validation (zero errors across 35,596 records)
- All codes match countries.postal_code_regex for their ISO2
- All country_id/state_id foreign keys resolve
- All state_code values agree with state.iso2
- No auto-managed fields present (id, created_at, updated_at, flag)
- Idempotent re-runs preserve any future curated rows by (code, locality_name)

License & attribution
- Source: OpenPLZ (https://openplzapi.org), ODbL-1.0
- Each row: source: "openplz" for programmatic attribution

Refs: #1039

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 08:01
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 27, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@github-actions
Copy link
Copy Markdown
Contributor

CSC Validation Report

PR Format

  • ✅ Description provided
  • ✅ Data source linked
  • ✅ Issue linked (recommended for data changes)
  • ✅ Justification / context provided

Labels applied: data:postcodes, large-contribution

⚠️ Large Contribution

This PR contains 35596 records. Large contributions require manual review.

Schema Validation (35596 records)

✅ All records passed validation

Cross-Reference Validation

✅ 71192 reference(s) verified

Source URL Verification


0 errors, 1 warning(s) | Status: Ready for review (with warnings)

@dr5hn dr5hn merged commit 0c54764 into master Apr 27, 2026
1 check passed
@dr5hn dr5hn deleted the feat/postcodes-dach-bulk branch April 27, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:postcodes enhancement New feature or request large-contribution ready-for-review size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants