Skip to content

fix(IT): drop 88 placeholder Provincia rows (#1349 follow-up)#1482

Merged
dr5hn merged 2 commits into
masterfrom
agent-1-1777302530
Apr 27, 2026
Merged

fix(IT): drop 88 placeholder Provincia rows (#1349 follow-up)#1482
dr5hn merged 2 commits into
masterfrom
agent-1-1777302530

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented Apr 27, 2026

Refs #1349.

Drops 87 placeholder Provincia … rows (ids 59104–59190 inclusive) from contributions/cities/IT.json. These are not real comuni — they were province-level pseudo-cities left over from the pre-#1395 schema, exactly what the issue reporter flagged ("Provincia di Lucca don't have sense to exist"). After the #1395 remap, every real comune already resolves to its province via state_id / state_code, so the placeholders are duplicate concepts.

Before After
IT.json rows 9,947 9,860
Provincia … rows 87 0

Note: the brief said "88 rows, 9,941 → 9,853". Verified counts on master are 87 / 9,947 → 9,860 (range arithmetic 59190 − 59104 + 1 = 87). Going with the numbers from the file.

Implementation

bin/scripts/fixes/italy_drop_provincia_placeholders.py — double-predicate filter (id in range AND name starts with Provincia ). Idempotent. Refuses to touch unfamiliar rows in the id range.

Validation

  • jq '. | length' → 9,860
  • jq '[.[] | select(.name | startswith("Provincia "))] | length' → 0
  • No parent_id refs to dropped ids; neighbour ids 59103/59191 preserved.
  • python3 -m json.tool parses cleanly; normalize_json.py is a no-op.

Scope

Fix details appended to .github/fixes-docs/FIX_1349_SUMMARY.md.

dr5hn and others added 2 commits April 27, 2026 20:37
… Post + fix KH regex (#1039)

Source: Cambodia Post 2017-reform 6-digit catalogue redistributed via
the seanghay/cambodia-postal-codes JSON. All 25 provinces resolve at
100% via direct numeric-iso2 lookup — the source's "id" field (1-25)
is identical to CSC's state.iso2 for Cambodia provinces. Records
dedupe at (postcode, sangkat + district) granularity.

Also fixes the Cambodia postal_code_regex/format in countries.json:
the previous "#####" / "^(\\d{5})$" never matched Cambodia Post's
post-2017 6-digit codes (e.g. 120101 for Phnom Penh / Khan Chamkar
Mon / Tonle Basak) and would have rejected every legitimate row.
Updated to "######" / "^(\\d{6})$".

Refs #1039.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes 87 placeholder "Provincia ..." records (ids 59104-59190)
from contributions/cities/IT.json. These were leftover province-level
pseudo-cities from the pre-#1395 schema; after the city→province
remap, every real comune resolves directly to its province via
state_id, so the placeholders are duplicate concepts.

contributions/cities/IT.json: 9,947 → 9,860.

Refs #1349.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 15:13
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. fixed Issue has been fixed labels Apr 27, 2026
@dr5hn dr5hn merged commit 2952960 into master Apr 27, 2026
2 of 3 checks passed
@dr5hn dr5hn deleted the agent-1-1777302530 branch April 27, 2026 15:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Follow-up cleanup for Italy city data to remove province-level placeholder “Provincia …” pseudo-cities left over from the pre-#1395 schema, plus an update aligning Cambodia’s postal-code metadata with its existing 6-digit postcode dataset.

Changes:

  • Removed 87 placeholder Provincia … rows (IDs 59104–59190) from contributions/cities/IT.json.
  • Updated Cambodia (KH) postal_code_format/postal_code_regex in contributions/countries/countries.json to 6-digit.
  • Added reproducibility scripts and updated the #1349 fix summary documentation.

Reviewed changes

Copilot reviewed 2 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
contributions/countries/countries.json Updates Cambodia postal code format/regex to 6-digit.
contributions/cities/IT.json Drops 87 province-level placeholder pseudo-city rows.
bin/scripts/sync/import_cambodia_postcodes.py Adds importer script for generating contributions/postcodes/KH.json.
bin/scripts/fixes/italy_drop_provincia_placeholders.py Adds defensive script to remove the targeted Italy placeholder rows.
.github/fixes-docs/FIX_1349_SUMMARY.md Documents the follow-up cleanup and validation results.

Comment on lines +2584 to +2585
"postal_code_format": "######",
"postal_code_regex": "^(\\d{6})$",
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces Cambodia postcode format/regex changes and a new KH importer script, but the PR title/description focus on the Italy placeholder-city cleanup. Consider updating the PR description/title to mention the Cambodia changes or splitting them into a separate PR to keep scope and review context aligned.

Copilot uses AI. Check for mistakes.
| `IT.json` city records | 9,947 | 9,860 |
| Rows starting with `Provincia ` | 87 | 0 |

(Note: the prompt's expected post-#1479 baseline of 9,941 was off by 6; the correct baseline at the time of this PR was 9,947, confirmed by `git log` and `jq '. \| length'`. Range arithmetic 59190 − 59104 + 1 = 87, not 88.)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note references a "post-#1479 baseline"; that PR number doesn’t appear elsewhere in this summary and seems unrelated to the #1349/#1395 context described above. If this is a typo, update it to the correct reference to avoid confusion when tracing the history.

Suggested change
(Note: the prompt's expected post-#1479 baseline of 9,941 was off by 6; the correct baseline at the time of this PR was 9,947, confirmed by `git log` and `jq '. \| length'`. Range arithmetic 59190 − 59104 + 1 = 87, not 88.)
(Note: the prompt's expected post-#1395 baseline of 9,941 was off by 6; the correct baseline at the time of this PR was 9,947, confirmed by `git log` and `jq '. \| length'`. Range arithmetic 59190 − 59104 + 1 = 87, not 88.)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:cities data:countries data:postcodes fixed Issue has been fixed large-contribution size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Italy seems to be totally wrong

2 participants