Skip to content

feat(postcodes/IN): bulk-import 19,100 India pincodes via India Post (#1039)#1430

Merged
dr5hn merged 1 commit into
masterfrom
feat/postcodes-india-bulk
Apr 27, 2026
Merged

feat(postcodes/IN): bulk-import 19,100 India pincodes via India Post (#1039)#1430
dr5hn merged 1 commit into
masterfrom
feat/postcodes-india-bulk

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented Apr 27, 2026

Summary

First bulk postcode PR using a real automated pipeline. Adds:

  1. bin/scripts/sync/import_india_post_postcodes.py — pipeline ingesting India Post's official pincode CSV from data.gov.in (NDSAP / Open Government Data licence). Picks one canonical record per pincode (Head Office > Sub Office > Branch Office), resolves state via normalised name match, handles edge cases (CHATTISGARH→Chhattisgarh, ORISSA→Odisha, merged DAMAN/DADRA UT, etc.).

  2. contributions/postcodes/IN.json β€” 19,100 unique pincodes covering all 36 Indian states and union territories.

Statistics

Metric Value
Records 19,100
With state_id 19,095 (99.97%)
With coordinates 101 (CSV has NA for most lat/lng)
File size 3.8 MB JSON
Unresolved 5 (CSV statename: NULL β€” ship as country-only)

Source attribution

  • source: "india-post" on every row
  • Source CSV: kthouz/the_smart_recruits archive of the canonical data.gov.in dataset (NDSAP/OGD licence, redistribution permitted)
  • Original publisher: India Post / Department of Posts

Validation

  • βœ… All 19,100 codes match countries.postal_code_regex (^(\d{6})$)
  • βœ… All country_id/state_id foreign keys resolve
  • βœ… All state_code values match the corresponding state.iso2
  • βœ… Zero auto-managed fields (id, created_at, updated_at, flag) present
  • βœ… Idempotent re-runs preserve any future curated rows by code (merge_with_existing keeps existing entries)

State name normalisation

The CSV uses uppercase / & / archaic spellings; the normaliser maps to canonical state names:

TELANGANA β†’ Telangana
ANDAMAN & NICOBAR ISLANDS β†’ Andaman and Nicobar Islands
DADRA & NAGAR HAVELI β†’ Dadra and Nagar Haveli and Daman and Diu (merged 2020)
DAMAN & DIU β†’ Dadra and Nagar Haveli and Daman and Diu (merged 2020)
CHATTISGARH β†’ Chhattisgarh
ORISSA β†’ Odisha
PONDICHERRY β†’ Puducherry
UTTARANCHAL β†’ Uttarakhand

Roadmap impact

This validates the bulk-pipeline pattern from #1427 against real data. Same shape will work for:

Refs: #1039

…1039)

Adds two things in one PR (the script + its first run):

1. bin/scripts/sync/import_india_post_postcodes.py β€” pipeline that
   ingests India Post's official pincode CSV from data.gov.in (Open
   Government Data / NDSAP licence). Picks one canonical record per
   pincode (preferring Head Office > Sub Office > Branch Office),
   resolves state via normalised name match against states.json,
   handles common edge cases (CHATTISGARH→Chhattisgarh,
   ORISSA→Odisha, DAMAN/DADRA→merged UT, etc.).

2. contributions/postcodes/IN.json β€” 19,100 unique pincodes covering
   all 36 Indian states and union territories. 99% (19,095) resolve
   to a state_id; the 5 unresolved have NULL state names in the
   source CSV and ship with country_id only.

Statistics
- Records:        19,100
- With state_id:  19,095 (99%)
- With coords:    101 (CSV has NA for most lat/lng)
- File size:      3.8 MB JSON

Source attribution
- source: "india-post" set on every row
- Source CSV: kthouz/the_smart_recruits archive of the canonical
  data.gov.in dataset (NDSAP/OGD licence, redistribution permitted)

Validation
- All 19,100 codes match countries.postal_code_regex (^(\\d{6})\$)
- All country_id/state_id FKs resolve
- All state_code values match the corresponding state.iso2
- Zero auto-managed fields present
- Idempotent re-runs preserve any future curated rows by code

Refs: #1039

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 07:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

CSC Validation Report

PR Format

  • βœ… Description provided
  • βœ… Data source linked
  • βœ… Issue linked (recommended for data changes)
  • βœ… Justification / context provided

Labels applied: data:postcodes, large-contribution

⚠️ Large Contribution

This PR contains 19100 records. Large contributions require manual review.

Schema Validation (19100 records)

βœ… All records passed validation

Cross-Reference Validation

βœ… 38195 reference(s) verified

Geo-Bounds Check

βœ… All 101 coordinate(s) within expected country bounds


βœ… All checks passed | Status: Ready for review

@dosubot dosubot Bot added automated enhancement New feature or request labels Apr 27, 2026
@dr5hn dr5hn merged commit a158823 into master Apr 27, 2026
1 check passed
@dr5hn dr5hn deleted the feat/postcodes-india-bulk branch April 27, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automated data:postcodes enhancement New feature or request large-contribution ready-for-review size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants