feat(FR): remap mainland cities region->department (#1352 PR-E)#1484
Conversation
Reassigns 8,727 of 10,079 French cities from the 12 metropolitan regions plus the Corsica collectivity (20R) to the correct INSEE department-level state (01-95, 2A, 2B, 75C). Mirrors the IT remap shipped in #1395. Endpoints like GET /v1/countries/FR/states/03/cities (Allier) used to return [] because all of Allier's communes sat under the parent region ARA. After this fix Allier holds 59 cities. Same was true for every other metropolitan department. Resolution cascade (offline, dependency-free, idempotent): 1. INSEE name match in current region (region tie-break + nearest coord) 2. INSEE name match anywhere within 25km 3. 5-NN proximity vote weighted by inverse distance, capped at 25km Only state_id / state_code are mutated. name, native, latitude, longitude, wikiDataId, translations, population, timezone are preserved verbatim. 0 unmapped, 0 deleted; re-run produces 0 changes. Bundles the geo.api.gouv.fr commune dataset (Etalab Licence Ouverte v2.0, ODbL-1.0 compatible) under bin/scripts/fixes/data/ for reproducibility. Refs #1352 — does not close (sibling PRs A/B/C/D handle other facets). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements the France mainland city “region → department” parent remap to address issue #1352, aligning contributions/cities/FR.json city state_id/state_code values with INSEE department-level subdivisions instead of metropolitan regions.
Changes:
- Adds an offline/idempotent remap script (
france_cities_remap.py) that resolves each city to an INSEE commune and maps it to the correct department (state.iso2), including the75 → 75Coverride. - Commits a structured run report (
france_cities_remap.report.json) summarizing remap counts and a sample of per-city annotations. - Adds fix documentation (
FIX_1352_PR_E_SUMMARY.md) describing methodology, validation, and outcomes.
Reviewed changes
Copilot reviewed 1 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| bin/scripts/fixes/france_cities_remap.py | New remap script to reassign FR cities from region-level to department-level state codes/ids using geo.api.gouv.fr commune data. |
| bin/scripts/fixes/france_cities_remap.report.json | Structured output report capturing totals, per-source/per-target distributions, and sample annotations. |
| .github/fixes-docs/FIX_1352_PR_E_SUMMARY.md | Documentation of scope, approach, counts, and validation for the FR cities remap. |
| | State_code level | Cities (before) | Cities (after) | | ||
| |------------------|----------------:|---------------:| | ||
| | Metropolitan region (ARA, IDF, NOR, PDL, NAQ, BRE, OCC, GES, CVL, BFC, HDF, PAC) | 8,699 | 0 | | ||
| | Corsica collectivity (`20R`) | 28 | 0 | | ||
| | Metropolitan department (01–95, `2A`, `2B`, `75C`) | 1,351 | 10,078 | | ||
| | Other (overseas: NC, etc.) | 1 | 1 | |
There was a problem hiding this comment.
The before/after distribution table claims the only non-metropolitan bucket after the remap is a single overseas row (NC), but the committed remap report shows at least two cities remapped to overseas department codes (971 and 974). Please reconcile the summary numbers with the actual output (either update the table/wording to include these overseas department assignments, or adjust the script to exclude overseas departments so the table remains correct).
…e_code After cherry-picking PR-A onto post-PR-E master, ran france_cities_remap.py (the script committed in PR-E #1484) to remap the 455 newly-added communes from their authored region codes (NOR, PDL, ARA, etc.) to the correct INSEE department codes (50 for Manche, 14 for Calvados, etc.). Verified: 0 region-coded rows remain, 0 invalid state_ids. Allier (state_code=03) goes from 59 to 60 cities. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…communes (#1352 PR-A) (#1394) * feat(FR): diff cities against data.gouv.fr, add missing metropolitan communes (#1352 PR-A) Adds 455 metropolitan French communes (population ≥ 2,000) that were missing from contributions/cities/FR.json relative to the canonical INSEE list at data.gouv.fr. Includes large communes-nouvelles created since 2015 — e.g., Cherbourg-en-Cotentin (78K), Évry-Courcouronnes (66K), Saint-Ouen-sur-Seine (53K), Oullins-Pierre-Bénite (38K). The diff script (bin/scripts/fixes/france_cities_diff.py) produces a structured report and a conservative merge proposal: - Matches by (state_code, normalised name); normalisation handles œ/æ ligatures and lès/lez preposition variants. - Department overrides for 2A/2B/48/52/55 follow existing FR.json convention. - 1,194 cross-region matches (cities under wrong state) are flagged for PR-B, not auto-moved. - 643 "extra" CSC records (obsolete/merged communes, quartiers, dept names) are flagged for PR-C. - Overseas territories excluded (PR-D). Validation: 0 schema errors, 0 cross-reference errors, 0 coord-bounds violations, 0 exact-name same-state duplicates. Full breakdown in .github/fixes-docs/FIX_1352_PR_A_SUMMARY.md. Refs #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(FR): remap PR-A's 455 new communes from region to department state_code After cherry-picking PR-A onto post-PR-E master, ran france_cities_remap.py (the script committed in PR-E #1484) to remap the 455 newly-added communes from their authored region codes (NOR, PDL, ARA, etc.) to the correct INSEE department codes (50 for Manche, 14 for Calvados, etc.). Verified: 0 region-coded rows remain, 0 invalid state_ids. Allier (state_code=03) goes from 59 to 60 cities. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hy (#1489) Customer-facing follow-up to #1349 (Italy) and #1352 (France). Cities were re-parented onto departments (FR) and provinces (IT) by #1395 / #1394 / #1393 / #1400 / #1484, but the state records themselves still carried inconsistent 'level' values, blocking downstream filters like "all departments == level=2" or "all regions == level=1". bin/scripts/fixes/states_level_normalise.py drives the change: - FR: 29 region-tier rows None -> 1 (13 metro regions, 3 special metro collectivities incl. Corse + Alsace + Métropole de Lyon, 13 overseas regions/collectivities/territories/dependency). 95 metropolitan departments unchanged at level=2. - IT: 103 rows updated. Final state: 20 at level=1 (15 region + 5 autonomous region) and 106 at level=2 (80 province + 14 metropolitan city + 6 free municipal consortium + 4 decentralized regional entity + 2 autonomous province). Only the 'level' field is touched; idempotent on re-run; non-FR/IT states untouched. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #1352 — does NOT close. The FR equivalent of #1395 (Italy region→province remap). Sibling PRs #1394 (PR-A diff/additions), #1393 (PR-B), #1400 (PR-D), #1392 (PR-C) cover other facets of the same issue and do not remap existing cities.
Customer report (Allier):
GET /v1/countries/FR/states/03/citiespreviously returned[]because every Allier commune sat under the parent regionARA. After this PR, Allier holds 59 cities (Vichy, Moulins, Montluçon, …). Same was true for every other metropolitan department.What
bin/scripts/fixes/france_cities_remap.py— offline, dependency-free, idempotent. Reassigns 8,727 of 10,079 FR cities from the 12 metropolitan regions plus the Corsica collectivity to the correct INSEE department-level entity (01–95,2A,2B,75C). Onlystate_idandstate_codeare mutated;name,native,latitude,longitude,wikiDataId,translationsetc. are preserved verbatim.Before / after distribution
20R)01–95,2A,2B,75C)Top 5 target departments after remap:
55(500),52(428),59(345),62(238),2B(237).Per-resolution-path counts
name_unique— single name matchname_region— one in-region candidate among multiplename_region_multi— multi in-region, closest by coordname_other_region— name match outside region within 25kmproximity_knn— 5-NN inverse-distance vote, capped at 25kmProximity-pass distance distribution: 499 of 687 within 3 km, max 8.56 km, none above 10 km.
Mapping source
https://geo.api.gouv.fr/communes(Licence Ouverte v2.0 / Etalab — ODbL-1.0 compatible). 34,969 communes bundled atbin/scripts/fixes/data/geo-api-gouv-communes.json. INSEEcodeDepartement= ourstate.iso2for every metropolitan dept, with one override:75→75C(Paris collectivity).Notes / known limitations
69M): upstreamcodeDepartementis just69for every dept-69 commune, so all 142 Lyon-region rows go to state69(Rhône). Splitting69Mout is left as a follow-up — none of our region-coded rows currently distinguished it either.Le Pin-en-Maugesno longer correspond to a separate INSEE commune (merged in the past decade), so they take the dept of their administrative successor via the proximity pass. PR-A'sextra_in_csclist (643 entries) provides the surface for a separate cleanup PR if maintainers want to drop the historical names.state_code=03now returns 59 cities.Full methodology and edge cases in
.github/fixes-docs/FIX_1352_PR_E_SUMMARY.md.🤖 Generated with Claude Code