feat(postcodes/IN): bulk-import 19,100 India pincodes via India Post (#1039)#1430
Merged
Conversation
β¦1039) Adds two things in one PR (the script + its first run): 1. bin/scripts/sync/import_india_post_postcodes.py β pipeline that ingests India Post's official pincode CSV from data.gov.in (Open Government Data / NDSAP licence). Picks one canonical record per pincode (preferring Head Office > Sub Office > Branch Office), resolves state via normalised name match against states.json, handles common edge cases (CHATTISGARHβChhattisgarh, ORISSAβOdisha, DAMAN/DADRAβmerged UT, etc.). 2. contributions/postcodes/IN.json β 19,100 unique pincodes covering all 36 Indian states and union territories. 99% (19,095) resolve to a state_id; the 5 unresolved have NULL state names in the source CSV and ship with country_id only. Statistics - Records: 19,100 - With state_id: 19,095 (99%) - With coords: 101 (CSV has NA for most lat/lng) - File size: 3.8 MB JSON Source attribution - source: "india-post" set on every row - Source CSV: kthouz/the_smart_recruits archive of the canonical data.gov.in dataset (NDSAP/OGD licence, redistribution permitted) Validation - All 19,100 codes match countries.postal_code_regex (^(\\d{6})\$) - All country_id/state_id FKs resolve - All state_code values match the corresponding state.iso2 - Zero auto-managed fields present - Idempotent re-runs preserve any future curated rows by code Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
CSC Validation ReportPR Format
Labels applied:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First bulk postcode PR using a real automated pipeline. Adds:
bin/scripts/sync/import_india_post_postcodes.pyβ pipeline ingesting India Post's official pincode CSV from data.gov.in (NDSAP / Open Government Data licence). Picks one canonical record per pincode (Head Office > Sub Office > Branch Office), resolves state via normalised name match, handles edge cases (CHATTISGARHβChhattisgarh, ORISSAβOdisha, merged DAMAN/DADRA UT, etc.).contributions/postcodes/IN.jsonβ 19,100 unique pincodes covering all 36 Indian states and union territories.Statistics
state_idstatename: NULLβ ship as country-only)Source attribution
source: "india-post"on every rowkthouz/the_smart_recruitsarchive of the canonicaldata.gov.indataset (NDSAP/OGD licence, redistribution permitted)Validation
countries.postal_code_regex(^(\d{6})$)country_id/state_idforeign keys resolvestate_codevalues match the correspondingstate.iso2id,created_at,updated_at,flag) presentmerge_with_existingkeeps existing entries)State name normalisation
The CSV uses uppercase /
&/ archaic spellings; the normaliser maps to canonical state names:Roadmap impact
This validates the bulk-pipeline pattern from #1427 against real data. Same shape will work for:
Refs: #1039