Skip to content

fix(ES): drop 22 admin-level placeholder rows from cities (#1498)#1516

Merged
dr5hn merged 1 commit into
masterfrom
fix/issue-1498-es-drop-provincia-rows
May 5, 2026
Merged

fix(ES): drop 22 admin-level placeholder rows from cities (#1498)#1516
dr5hn merged 1 commit into
masterfrom
fix/issue-1498-es-drop-provincia-rows

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented May 4, 2026

Refs #1498.

Drops 22 admin-level placeholder rows from contributions/cities/ES.json:

  • 21 "Provincia de X" / "Província de X" rows (ids 36362, 36364, 36365, 36373, 36375, 36376, 36377, 36379, 36381, 36383, 36385, 36386, 36387, 36389, 36390, 36391, 36392, 36393, 36394, 36396, 36400). Spain's states.json already lists the 50 provinces as proper states, so these pseudo-cities are duplicate concepts. Their own state_code values are inconsistent (e.g. "Provincia de Burgos" parented under LE/León, "Provincia de Zaragoza" under HU/Huesca), confirming stub-data status.
  • 1 cross-state Alicante stub (id 32244, name Alicante, state_code=V) — exactly the cross-province leak the reporter flagged. The canonical Alicante row is id 152158 (Alicante/Alacant, state_code=A) under the proper Alicante province.
Before After
ES.json rows 8,427 8,405
Provincia * / Província * rows 21 0
Cross-state Alicante stub 1 0

Implementation

bin/scripts/fixes/spain_drop_provincia_placeholders.py — explicit id allowlist + per-id name/state verification. Refuses to drop rows whose name or state has shifted from what was audited. Idempotent.

Validation

  • Schema: 0 errors.
  • Cross-reference: 0 errors. Every state_id resolves to an ES state and state_code matches the resolved state's iso2.
  • Coordinate-bounds: 127 out-of-box violations (down from 129 on master — the drop included 2 invalid coords). Remaining 127 are all Canary Islands (state codes TF, GC), pre-existing, same pattern as IT/Lampedusa in feat(IT): remap cities to metropolitan cities and provinces (#1349) #1395.
  • Same-name + ≤5km duplicate pairs: 45, unchanged from master (all pre-existing).
  • python3 -m json.tool parses cleanly; normalize_json.py is a no-op.

Scope

Fix details in .github/fixes-docs/FIX_1498_PR_A_SUMMARY.md.

Drops 22 placeholder records from contributions/cities/ES.json:

- 21 "Provincia de X" / "Província de X" rows (ids 36362, 36364, 36365,
  36373, 36375, 36376, 36377, 36379, 36381, 36383, 36385, 36386, 36387,
  36389, 36390, 36391, 36392, 36393, 36394, 36396, 36400). Spanish
  provinces are already represented as proper states in states.json,
  making these pseudo-cities duplicate concepts. Their own state_code
  values are inconsistent (e.g. "Provincia de Burgos" parented under
  state_code=LE), confirming stub-data status.

- 1 cross-state Alicante stub (id 32244, state_code=V) flagged by the
  reporter as a cross-province leak in Valencia's city list. Canonical
  row is id 152158 ("Alicante/Alacant", state_code=A).

Counts: 8,427 -> 8,405 rows. Out-of-bounds coordinate violations drop
from 129 to 127 (the dropped stubs included 2 invalid coords). 0 schema
errors, 0 cross-reference errors, same-name <5km duplicate pairs
unchanged at 45 (all pre-existing).

Refs #1498. Does not close it -- PR-B follow-up retags ~6,920 mistyped
admin-level rows to type=city.
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 4, 2026
@dr5hn dr5hn merged commit 57eb424 into master May 5, 2026
1 check failed
@dr5hn dr5hn deleted the fix/issue-1498-es-drop-provincia-rows branch May 5, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:cities large-contribution size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][ES] GetCity returns province-level administrative entries as cities (e.g., 'Provincia de Madrid')

1 participant