Skip to content

feat(postcodes/CZ): 2,695 Česká pošta PSČ codes (#1039)#1512

Merged
dr5hn merged 1 commit into
masterfrom
feat/postcodes-czech-republic
May 5, 2026
Merged

feat(postcodes/CZ): 2,695 Česká pošta PSČ codes (#1039)#1512
dr5hn merged 1 commit into
masterfrom
feat/postcodes-czech-republic

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented May 4, 2026

Summary

  • Imports the Czech Republic's full 5-digit PSČ list (2,695 codes) for Can we add a postcode for this? #1039
  • 100% state FK resolution across 76 districts + Praha capital city
  • Resolves the research-doc Tier B note Only ships a Perl scraper for the 2007 stamps DB; would need to run scraper + fetch from Česká pošta b2b (memory: TLS handshake fails)

Source

Why this source (not soit-sk)

The previously-tracked soit-sk/czech_republic_post_codes_2007 shipped only a Perl scraper requiring Česká pošta b2b TLS access (blocked from this harness per memory). 1nfinity84's mirror is a static JSON join — no scraping needed, refresh by upstream re-publish.

State FK strategy

Direct district-name match against CSC's 76 okres entries plus a single alias:

  • 'Praha''Praha, Hlavní město' (CSC iso2 10, the capital city which is administered separately from the surrounding Praha-východ/Praha-západ districts)

For PSCs whose source value is an array (multiple districts share the same PSC), picks the first as primary state.

Locality

Each record carries a locality_name derived from the source's psc_to_obec list. Parenthetical fragments like (část) (part of) or (Praha 10) are stripped for readability.

Distribution (top 5)

iso2 district rows
643 Brno-venkov 68
10 Praha (Hlavní město) 63
805 Opava 57
525 Trutnov 55
645 Hodonín 55

Test plan

  • python3 -m py_compile bin/scripts/sync/import_czech_postcodes.py
  • All 2,695 codes match ^\d{3}\s?\d{2}$
  • 100% state_id valid; state.country_id == 58; state_code == state.iso2
  • No auto-managed fields (id, created_at, updated_at, flag)
  • Idempotent merge (re-run produces no diff)

🤖 Generated with Claude Code

Adds the full Czech 5-digit PSČ (poštovní směrovací číslo / postal
code) list joined with okres (district) and obec (municipality)
data from the 1nfinity84/PSC-Okres-Obec-OkresCZ mirror.

Why
---
Closes the CZ gap on issue #1039. The previously-tracked
soit-sk/czech_republic_post_codes_2007 source shipped only a Perl
scraper for the 2007 stamps DB and required Česká pošta b2b TLS
access (blocked from this harness). 1nfinity84's mirror is a
static JSON join requiring no scraping.

Coverage
--------
- 2,695 codes / 100% state FK
- 77 of 90 CSC CZ states covered (76 districts + Praha capital city)

State FK strategy
-----------------
Direct district-name match against CSC's 76 okres entries plus a
single alias 'Praha' -> 'Praha, Hlavní město' (CSC iso2 '10', the
capital city which is administered separately from the surrounding
Praha-východ/Praha-západ districts).

For PSCs whose source value is an array (multiple districts share
the same PSC), picks the first as primary state.

Locality
--------
Each record carries a locality_name derived from the source's
psc_to_obec list. Parenthetical fragments like '(část)' (part of)
or '(Praha 10)' are stripped for readability.

License
-------
1nfinity84/PSC-Okres-Obec-OkresCZ: no formal LICENSE file.
Upstream chain: Česká pošta + ČSÚ open lookups -> rotten77's SQL
dump -> 1nfinity84's static JSON join.
Tier 5 per #1039 license-tier policy.
Each row: source: "ceska-posta-via-1nfinity84"

Validation
----------
- python3 -m py_compile passes
- 100% regex match (^\d{3}\s?\d{2}$)
- 100% state_id valid + state.country_id == 58 + state_code agrees
- No auto-managed fields (id, created_at, updated_at, flag)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 4, 2026
@dosubot dosubot Bot added the enhancement New feature or request label May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

CSC Validation Report

PR Format

  • ✅ Description provided
  • ✅ Data source linked
  • ✅ Issue linked (recommended for data changes)
  • ✅ Justification / context provided

Labels applied: data:postcodes, large-contribution

⚠️ Large Contribution

This PR contains 2695 records. Large contributions require manual review.

Schema Validation (2695 records)

✅ All records passed validation

Cross-Reference Validation

✅ 5390 reference(s) verified

Source URL Verification

✅ 2 source URL(s) accessible


All checks passed | Status: Ready for review

@dr5hn dr5hn merged commit 731edfb into master May 5, 2026
1 check passed
@dr5hn dr5hn deleted the feat/postcodes-czech-republic branch May 5, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:postcodes enhancement New feature or request large-contribution ready-for-review size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant