feat(FR): diff cities against data.gouv.fr, add missing metropolitan communes (#1352 PR-A)#1394
Merged
Merged
Conversation
This was referenced Apr 27, 2026
Owner
Author
Weekly data-quality review (2026-04-27)Verdict: clean Checks
Advisory (non-blocking)
🤖 Automated weekly review — Claude (sonnet-4-6). Generated by Claude Code |
…communes (#1352 PR-A) Adds 455 metropolitan French communes (population ≥ 2,000) that were missing from contributions/cities/FR.json relative to the canonical INSEE list at data.gouv.fr. Includes large communes-nouvelles created since 2015 — e.g., Cherbourg-en-Cotentin (78K), Évry-Courcouronnes (66K), Saint-Ouen-sur-Seine (53K), Oullins-Pierre-Bénite (38K). The diff script (bin/scripts/fixes/france_cities_diff.py) produces a structured report and a conservative merge proposal: - Matches by (state_code, normalised name); normalisation handles œ/æ ligatures and lès/lez preposition variants. - Department overrides for 2A/2B/48/52/55 follow existing FR.json convention. - 1,194 cross-region matches (cities under wrong state) are flagged for PR-B, not auto-moved. - 643 "extra" CSC records (obsolete/merged communes, quartiers, dept names) are flagged for PR-C. - Overseas territories excluded (PR-D). Validation: 0 schema errors, 0 cross-reference errors, 0 coord-bounds violations, 0 exact-name same-state duplicates. Full breakdown in .github/fixes-docs/FIX_1352_PR_A_SUMMARY.md. Refs #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e_code After cherry-picking PR-A onto post-PR-E master, ran france_cities_remap.py (the script committed in PR-E #1484) to remap the 455 newly-added communes from their authored region codes (NOR, PDL, ARA, etc.) to the correct INSEE department codes (50 for Manche, 14 for Calvados, etc.). Verified: 0 region-coded rows remain, 0 invalid state_ids. Allier (state_code=03) goes from 59 to 60 cities. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
caac707 to
a674624
Compare
There was a problem hiding this comment.
Pull request overview
Adds a France-focused diff/merge workflow to identify missing metropolitan communes against INSEE (data.gouv.fr), commit an audit report, and document the methodology for issue #1352 (PR-A of a planned series).
Changes:
- Add 455 metropolitan French commune records to
contributions/cities/FR.json(population ≥ 2,000). - Add a new diff script (
bin/scripts/fixes/france_cities_diff.py) plus a committed diagnostic report JSON artifact. - Add a methodology write-up under
.github/fixes-docs/.
Reviewed changes
Copilot reviewed 1 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| contributions/cities/FR.json | Adds missing metropolitan communes to the contributed France cities dataset. |
| bin/scripts/fixes/france_cities_diff.py | New script to diff CSC vs INSEE/geo.api.gouv.fr and generate merge/report artifacts. |
| bin/scripts/fixes/france_cities_diff.report.json | Committed snapshot of diff statistics and representative samples for reviewer audit. |
| .github/fixes-docs/FIX_1352_PR_A_SUMMARY.md | Documents scope, sources, matching strategy, and validation for PR-A. |
Comment on lines
+78
to
+85
| - `target_state_code` is derived from the upstream `(departement, region)` pair using: | ||
| - INSEE region code → CSC region iso2 (e.g., `84 → ARA`, `11 → IDF`, `94 → 2A/2B`). | ||
| - **Department-level overrides** for the five departments that the existing | ||
| FR.json stores at department-level rather than region-level: `2A`, `2B`, | ||
| `48`, `52`, `55`. (Discovered empirically; new records in those depts must | ||
| follow suit to match existing convention.) | ||
|
|
||
| If the primary state lookup misses, the script also tries every other metropolitan CSC state code. A hit there is **not** a successful match — it's a *cross-region match*, flagged for PR-B (region reclassification), and the upstream record is still considered "missing" only if no fallback hit exists. |
Comment on lines
+100
to
+105
| Following the existing FR.json convention: | ||
|
|
||
| ```json | ||
| { | ||
| "name": "Évry-Courcouronnes", // INSEE official French name | ||
| "state_id": 4796, "state_code": "IDF", |
dr5hn
added a commit
that referenced
this pull request
Apr 27, 2026
…hy (#1489) Customer-facing follow-up to #1349 (Italy) and #1352 (France). Cities were re-parented onto departments (FR) and provinces (IT) by #1395 / #1394 / #1393 / #1400 / #1484, but the state records themselves still carried inconsistent 'level' values, blocking downstream filters like "all departments == level=2" or "all regions == level=1". bin/scripts/fixes/states_level_normalise.py drives the change: - FR: 29 region-tier rows None -> 1 (13 metro regions, 3 special metro collectivities incl. Corse + Alsace + Métropole de Lyon, 13 overseas regions/collectivities/territories/dependency). 95 metropolitan departments unchanged at level=2. - IT: 103 rows updated. Final state: 20 at level=1 (15 region + 5 autonomous region) and 106 at level=2 (80 province + 14 metropolitan city + 6 free municipal consortium + 4 decentralized regional entity + 2 autonomous province). Only the 'level' field is touched; idempotent on re-run; non-FR/IT states untouched. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 455 metropolitan French communes (population ≥ 2,000) that were missing from
contributions/cities/FR.jsonrelative to the canonical INSEE commune list at data.gouv.fr.This is PR-A of 4 in the issue #1352 plan — siblings PR-B (region reclassification), PR-C (obsolete/merged commune cleanup), and PR-D (overseas territories) are tracked separately and explicitly out of scope here.
Refs #1352 — does not close the issue.
What this PR does
contributions/cities/FR.json. All metropolitan, all population ≥ 2,000, all with coordinates and population fromgeo.api.gouv.fr. Top adds include Cherbourg-en-Cotentin (78K), Évry-Courcouronnes (66K), Saint-Ouen-sur-Seine (53K), Oullins-Pierre-Bénite (38K), Herblay-sur-Seine (32K), Le Chesnay-Rocquencourt (31K).bin/scripts/fixes/france_cities_diff.py— pure-Python, dependency-free; matches by(state_code, normalised_name)with œ/æ ligature andlès/lezpreposition handling.bin/scripts/fixes/france_cities_diff.report.jsonfor reviewer audit and follow-up PRs..github/fixes-docs/FIX_1352_PR_A_SUMMARY.md.Diagnostic findings flagged for sibling PRs (NOT fixed here)
cross_region_matches(CSC city under wrong region — incl. Ajaccio/Bastia under20Rinstead of2A/2B)extraCSC records (obsolete/merged communes likeAime/Annecy-le-Vieux/Ancenis; Marseille quartiers likeArenc/La Villette; dept names mistakenly stored as cities likeAlpes-Maritimes/Ardennes)Test plan
type/level/parent_id/native/population, identical to those produced by the existing 10,079 records (project convention; warnings, not blockers).state_id/country_idresolves; codes match.BréhanvsRohan, dept 56) — verified to be two genuinely different communes 4.3 km apart with INSEE codes 56025 vs 56196.id/created_at/updated_at/flagfields on new records.Reviewer notes
villes.min.jsonandgeo.api.gouv.frdumps. The--pop-thresholdarg is exposed so the threshold can be tuned without code changes.france_cities_diff.merge.jsonandfrance_cities_diff.deferred.jsonare not committed — the merge is now in FR.json and the deferred set (~7 MB) is regeneratable.wikiDataIdare intentionally left empty for new records rather than synthesised; can be backfilled in a future pass.🤖 Generated with Claude Code