Skip to content

feat(postcodes/CN): 22,656 China Post codes (#1039)#1485

Merged
dr5hn merged 1 commit into
masterfrom
feat/postcodes-china
Apr 27, 2026
Merged

feat(postcodes/CN): 22,656 China Post codes (#1039)#1485
dr5hn merged 1 commit into
masterfrom
feat/postcodes-china

Conversation

@dr5hn

@dr5hn dr5hn commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

  • Imports 22,656 China Post 6-digit postcodes for issue Can we add a postcode for this? #1039
  • 100% state FK resolution across all 31 mainland CSC provinces / municipalities / autonomous regions
  • Hand-curated 60-entry 2-digit prefix → iso2 lookup (no province column in source)

Source

  • mumuy/data_post — MIT, ~45⭐, the largest mature mirror
  • File: list.json (dict keyed by 6-digit postcode → most-specific Chinese district/town)
  • License: MIT (clean redistribution)

State FK strategy

The source has no province column — 22,656 values are Chinese district/town names that would not name-match against states.json reliably. The PREFIX_TO_ISO2 table maps each 2-digit code prefix to one of the 31 CSC CN states; entries derived from XX0000 trunk codes + per-prefix province-name vote count.

China's 6-digit code structure (per source README):

[province: 2][postal-region: 1][county: 1][delivery: 2]

Coverage notes

  • Mainland only — HK / MO / TW are separate iso2 countries in CSC and have their own postcode systems (HK / MO have none; TW handled by a future Can we add a postcode for this? #1039 PR using flying-itmen-eagle/eagle-tw-open-data).
  • Prefix 14 is unused by China Post (no 140000-149999 codes).
  • Top states by row count: SC 1,451 · SD 1,368 · HA 1,284 · YN 1,090 · HE 1,018 · GD 1,004.

Test plan

  • python3 -m py_compile bin/scripts/sync/import_china_postcodes.py
  • All 22,656 codes match ^\d{6}$
  • 100% state_id valid; state.country_id == 45; state_code == state.iso2
  • No auto-managed fields (id, created_at, updated_at, flag)
  • Idempotent merge (re-run produces no diff)

🤖 Generated with Claude Code

Adds the mumuy/data_post 6-digit postcode dataset (MIT-licensed,
~45⭐) covering all 31 mainland Chinese provinces, autonomous
regions, and direct-administered municipalities.

Why
---
Closes the CN gap on issue #1039. mumuy/data_post is the largest
mature MIT-licensed mirror; the official China Post (中国邮政)
publishes its full postal-code list only via paid API.

Coverage
--------
- 22,656 codes / 100% state FK resolution
- 31 of 34 CSC CN states covered (HK / MO / TW handled as separate
  CSC countries)
- All 60 source 2-digit prefixes mapped via PREFIX_TO_ISO2 (derived
  from XX0000 trunk codes + per-prefix province-name vote count)

State FK strategy
-----------------
Source has no province column — the 22,656 values are district/town
names in Chinese that would not name-match against states.json
reliably. Hand-curated 60-entry 2-digit prefix table is the only
reliable resolver and pulls 100% FK.

License
-------
MIT (clean redistribution). Each row carries
`source: "china-post-via-mumuy"` for export-time attribution.

Validation
----------
- python3 -m py_compile passes
- 100% regex match (^\d{6}$)
- 100% state_id valid + state.country_id == 45 + state_code agrees
- No auto-managed fields (id, created_at, updated_at, flag)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 15:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 27, 2026
@dosubot dosubot Bot added the enhancement New feature or request label Apr 27, 2026
@github-actions

Copy link
Copy Markdown
Contributor

CSC Validation Report

PR Format

  • ✅ Description provided
  • ✅ Data source linked
  • ✅ Issue linked (recommended for data changes)
  • ✅ Justification / context provided

Labels applied: data:postcodes, large-contribution

⚠️ Large Contribution

This PR contains 22656 records. Large contributions require manual review.

Schema Validation (22656 records)

✅ All records passed validation

Cross-Reference Validation

✅ 45312 reference(s) verified

Source URL Verification

✅ 2 source URL(s) accessible


All checks passed | Status: Ready for review

@dr5hn dr5hn merged commit 7af015f into master Apr 27, 2026
1 check passed
@dr5hn dr5hn deleted the feat/postcodes-china branch April 27, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:postcodes enhancement New feature or request large-contribution ready-for-review size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants