feat(FR): polish department + region metadata vs data.gouv.fr (#1352 PR-B)#1393
Merged
Conversation
This was referenced Apr 27, 2026
Owner
Author
Weekly data-quality review (2026-04-27)Verdict: clean Checks
Advisory (non-blocking)
🤖 Automated weekly review — Claude (sonnet-4-6). Generated by Claude Code |
Merged
7 tasks
…PR-B) Diff repo's 124 FR state records against the authoritative data.gouv.fr INSEE feeds (departements + regions) and apply 14 conservative fixes: - 11 metropolitan department `native` fields had garbage text fragments (e.g. Ain="Se faire", Aude="entendus", Var="Notre"). Repaired against the feed's `nom`. - French Guiana (973): `native` was the English string; corrected to `Guyane` per the overseas DROM feed. - Corse (20R): reclassified from `metropolitan collectivity with special status` → `metropolitan region` (data.gouv.fr lists Corse in the regions feed; Wikipedia/INSEE confirm it is one of France's 18 regions). Also fixed `native` typo `Corsée` → `Corse`. Adds `bin/scripts/fixes/france_states_diff.py` — a read-only diagnostic that reports real deltas, advisory typography, coverage, and the records outside the feed scope (overseas collectivities, dependencies, etc.). Out of scope for PR-B (cities = PR-A; overseas city files = PR-D): - Region native typography drift (Grand-Est/Grand Est, etc.) - Overseas collectivity `native`/`name` cleanups (PM, NC, TF, WF) - `fips_code` coverage gap (27/124; not in data.gouv.fr feeds) - Possible timezone errors on DROM (e.g. 973 set to Europe/Paris) See `.github/fixes-docs/FIX_1352_PR_B_SUMMARY.md` for the full audit. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a5c725c to
6ac8f45
Compare
There was a problem hiding this comment.
Pull request overview
Polishes France subdivision metadata by aligning selected native names and the Corse region classification in the contributions dataset, and adds a read-only diagnostic script plus documentation to audit FR department/region deltas against data.gouv.fr feeds.
Changes:
- Fixes corrupted/incorrect
nativevalues for 11 metropolitan departments and French Guiana; corrects Corsenativetypo. - Reclassifies Corse (
FR-20R) from special-status collectivity tometropolitan region. - Adds
bin/scripts/fixes/france_states_diff.py(read-only comparison tool) and a companion audit summary doc.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| contributions/states/states.json | Updates FR department/region native values and changes Corse type to metropolitan region. |
| bin/scripts/fixes/france_states_diff.py | New script to diff FR repo records vs data.gouv.fr department/region feeds and report deltas/coverage. |
| .github/fixes-docs/FIX_1352_PR_B_SUMMARY.md | Documents scope, rationale, and validation results for PR-B of issue #1352. |
Comments suppressed due to low confidence (1)
contributions/states/states.json:181883
- The French Guiana (iso2=973) state record still has
timezoneset toEurope/Paris, but French Guiana’s IANA timezone isAmerica/Cayenne(also reflected incontributions/countries/countries.jsonfor country_code=GF). Since this PR already touches this record, consider correcting the timezone here (and potentially auditing the other overseas regions’ timezones in a follow-up).
This was referenced Apr 27, 2026
dr5hn
added a commit
that referenced
this pull request
Apr 27, 2026
…hy (#1489) Customer-facing follow-up to #1349 (Italy) and #1352 (France). Cities were re-parented onto departments (FR) and provinces (IT) by #1395 / #1394 / #1393 / #1400 / #1484, but the state records themselves still carried inconsistent 'level' values, blocking downstream filters like "all departments == level=2" or "all regions == level=1". bin/scripts/fixes/states_level_normalise.py drives the change: - FR: 29 region-tier rows None -> 1 (13 metro regions, 3 special metro collectivities incl. Corse + Alsace + Métropole de Lyon, 13 overseas regions/collectivities/territories/dependency). 95 metropolitan departments unchanged at level=2. - IT: 103 rows updated. Final state: 20 at level=1 (15 region + 5 autonomous region) and 106 at level=2 (80 province + 14 metropolitan city + 6 free municipal consortium + 4 decentralized regional entity + 2 autonomous province). Only the 'level' field is touched; idempotent on re-run; non-FR/IT states untouched. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-B of the four-PR Option-C plan for France data review (#1352). Polishes only department + region metadata in
contributions/states/states.json, leaving cities (PR-A) and overseas city files (PR-D) untouched.Authoritative source: data.gouv.fr open feeds (INSEE-derived):
departements.min.json(101 entries) — https://github.com/user-attachments/files/25721909/departements.min.jsonregions.min.json(18 entries) — https://github.com/user-attachments/files/25721911/regions.min.jsonA new read-only diff tool,
bin/scripts/fixes/france_states_diff.py, compares the feeds against repocountry_code='FR'records and reports real deltas, advisory typography, coverage gaps, and out-of-scope records.Changes (14 fields, 13 records — all in
contributions/states/states.json)11 metropolitan department
nativerepairs — the field contained unrelated French text fragments (looked like translation-pipeline garbage) on 11 records. Repaired against data.gouv.frnom:nativebefore → afterSe faire→Ainentendus→AudeTon→EureGardien→GardGénie→GersInterne→IndreParcelle→LotQuelques→MancheMiuse→MeusePeau→Haut-RhinNotre→Var1 overseas region
nativerepair:nativebefore → afterFrench Guiana→GuyaneCorse (20R) reclassification + native typo — Wikipedia/INSEE confirm Corse is one of France's 18 regions; data.gouv.fr lists it in the regions feed (regional code
94). Reclassifying brings the metropolitan region count to 13 and the special-status collectivity count to 2 (Lyon Métropole + Paris):typemetropolitan collectivity with special status→metropolitan regionnativeCorsée→CorseOut of scope (flagged for follow-up)
nativetypography drift vs data.gouv.fr (Grand-EstvsGrand Est,Pays-de-la-LoirevsPays de la Loire, PAC apostrophe — both forms in use officially).native/namecleanups (PM, NC, TF, WF) — PR-D scope.fips_codecoverage gap (27/124 FR records have FIPS) — not in data.gouv.fr feeds.Europe/Paris, should beAmerica/Cayenne).Full audit in
.github/fixes-docs/FIX_1352_PR_B_SUMMARY.md.Test plan
python3 bin/scripts/fixes/france_states_diff.pyreportsDepartment deltas: 0,Region deltas: 0.python3 -m json.tool contributions/states/states.json— JSON valid.idvalues still globally unique acrossstates.json(5,296 ids).country_id/country_coderesolves correctly againstcontributions/countries/countries.json..github/scripts/utils.jsvalidateRecordagainst the 13 touched records: 0 errors, identical warning footprint to untouched FR records.typedistribution after change: 95 dept / 13 metro region / 5 overseas region / 5 overseas collectivity / 2 metro collectivity w/ special status / 1 European collectivity / 1 overseas collectivity w/ special status / 1 overseas territory / 1 dependency = 124.Refs: #1352 (do not close — only PR-B of 4)
🤖 Generated with Claude Code