Skip to content

feat(FR): polish department + region metadata vs data.gouv.fr (#1352 PR-B)#1393

Merged
dr5hn merged 1 commit into
masterfrom
feat/issue-1352-france-states-polish
Apr 27, 2026
Merged

feat(FR): polish department + region metadata vs data.gouv.fr (#1352 PR-B)#1393
dr5hn merged 1 commit into
masterfrom
feat/issue-1352-france-states-polish

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented Apr 25, 2026

Summary

PR-B of the four-PR Option-C plan for France data review (#1352). Polishes only department + region metadata in contributions/states/states.json, leaving cities (PR-A) and overseas city files (PR-D) untouched.

Authoritative source: data.gouv.fr open feeds (INSEE-derived):

A new read-only diff tool, bin/scripts/fixes/france_states_diff.py, compares the feeds against repo country_code='FR' records and reports real deltas, advisory typography, coverage gaps, and out-of-scope records.

Changes (14 fields, 13 records — all in contributions/states/states.json)

11 metropolitan department native repairs — the field contained unrelated French text fragments (looked like translation-pipeline garbage) on 11 records. Repaired against data.gouv.fr nom:

ISO2 name native before → after
01 Ain Se faireAin
11 Aude entendusAude
27 Eure TonEure
30 Gard GardienGard
32 Gers GénieGers
36 Indre InterneIndre
46 Lot ParcelleLot
50 Manche QuelquesManche
55 Meuse MiuseMeuse
68 Haut-Rhin PeauHaut-Rhin
83 Var NotreVar

1 overseas region native repair:

ISO2 name native before → after
973 French Guiana French GuianaGuyane

Corse (20R) reclassification + native typo — Wikipedia/INSEE confirm Corse is one of France's 18 regions; data.gouv.fr lists it in the regions feed (regional code 94). Reclassifying brings the metropolitan region count to 13 and the special-status collectivity count to 2 (Lyon Métropole + Paris):

ISO2 field before → after
20R type metropolitan collectivity with special statusmetropolitan region
20R native CorséeCorse

Out of scope (flagged for follow-up)

  • Region native typography drift vs data.gouv.fr (Grand-Est vs Grand Est, Pays-de-la-Loire vs Pays de la Loire, PAC apostrophe — both forms in use officially).
  • Overseas collectivity native/name cleanups (PM, NC, TF, WF) — PR-D scope.
  • fips_code coverage gap (27/124 FR records have FIPS) — not in data.gouv.fr feeds.
  • Likely-stale timezone on 973 (set to Europe/Paris, should be America/Cayenne).

Full audit in .github/fixes-docs/FIX_1352_PR_B_SUMMARY.md.

Test plan

  • python3 bin/scripts/fixes/france_states_diff.py reports Department deltas: 0, Region deltas: 0.
  • python3 -m json.tool contributions/states/states.json — JSON valid.
  • FR record count unchanged: 124 before, 124 after.
  • All id values still globally unique across states.json (5,296 ids).
  • Every FR country_id/country_code resolves correctly against contributions/countries/countries.json.
  • .github/scripts/utils.js validateRecord against the 13 touched records: 0 errors, identical warning footprint to untouched FR records.
  • FR type distribution after change: 95 dept / 13 metro region / 5 overseas region / 5 overseas collectivity / 2 metro collectivity w/ special status / 1 European collectivity / 1 overseas collectivity w/ special status / 1 overseas territory / 1 dependency = 124.

Refs: #1352 (do not close — only PR-B of 4)

🤖 Generated with Claude Code

Copy link
Copy Markdown
Owner Author

dr5hn commented Apr 27, 2026

Weekly data-quality review (2026-04-27)

Verdict: clean

Checks

  • Schema: ✅ Only existing records modified; no new records; no auto-managed fields (flag, created_at, updated_at, id) touched.
  • FK integrity: ✅ Only native and type fields changed; no FK fields modified.
  • Coordinates: ✅ No coordinate changes.
  • Wikidata: N/A (no Wikidata field changes)
  • Naming convention: ✅ Correctly fixes:
    • 11 FR department native values that contained garbage text (e.g. "Se faire""Ain", "Miuse""Meuse", "Peau""Haut-Rhin") against authoritative data.gouv.fr nom.
    • French Guiana native from English "French Guiana" → French "Guyane" ✅ (canonical fix: English belongs in name, French in native).
    • Corse native typo "Corsée""Corse" ✅; type reclassified to "metropolitan region" in line with data.gouv.fr regions feed and ISO 3166-2.

Advisory (non-blocking)

🤖 Automated weekly review — Claude (sonnet-4-6).


Generated by Claude Code

…PR-B)

Diff repo's 124 FR state records against the authoritative data.gouv.fr
INSEE feeds (departements + regions) and apply 14 conservative fixes:

- 11 metropolitan department `native` fields had garbage text fragments
  (e.g. Ain="Se faire", Aude="entendus", Var="Notre"). Repaired against
  the feed's `nom`.
- French Guiana (973): `native` was the English string; corrected to
  `Guyane` per the overseas DROM feed.
- Corse (20R): reclassified from
  `metropolitan collectivity with special status` → `metropolitan region`
  (data.gouv.fr lists Corse in the regions feed; Wikipedia/INSEE confirm
  it is one of France's 18 regions). Also fixed `native` typo
  `Corsée` → `Corse`.

Adds `bin/scripts/fixes/france_states_diff.py` — a read-only diagnostic
that reports real deltas, advisory typography, coverage, and the records
outside the feed scope (overseas collectivities, dependencies, etc.).

Out of scope for PR-B (cities = PR-A; overseas city files = PR-D):
- Region native typography drift (Grand-Est/Grand Est, etc.)
- Overseas collectivity `native`/`name` cleanups (PM, NC, TF, WF)
- `fips_code` coverage gap (27/124; not in data.gouv.fr feeds)
- Possible timezone errors on DROM (e.g. 973 set to Europe/Paris)

See `.github/fixes-docs/FIX_1352_PR_B_SUMMARY.md` for the full audit.

Refs: #1352

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dr5hn dr5hn force-pushed the feat/issue-1352-france-states-polish branch from a5c725c to 6ac8f45 Compare April 27, 2026 15:37
@dr5hn dr5hn marked this pull request as ready for review April 27, 2026 15:37
Copilot AI review requested due to automatic review settings April 27, 2026 15:37
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 27, 2026
@dr5hn dr5hn merged commit 94d22dd into master Apr 27, 2026
2 checks passed
@dr5hn dr5hn deleted the feat/issue-1352-france-states-polish branch April 27, 2026 15:37
@dosubot dosubot Bot added the enhancement New feature or request label Apr 27, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Polishes France subdivision metadata by aligning selected native names and the Corse region classification in the contributions dataset, and adds a read-only diagnostic script plus documentation to audit FR department/region deltas against data.gouv.fr feeds.

Changes:

  • Fixes corrupted/incorrect native values for 11 metropolitan departments and French Guiana; corrects Corse native typo.
  • Reclassifies Corse (FR-20R) from special-status collectivity to metropolitan region.
  • Adds bin/scripts/fixes/france_states_diff.py (read-only comparison tool) and a companion audit summary doc.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File Description
contributions/states/states.json Updates FR department/region native values and changes Corse type to metropolitan region.
bin/scripts/fixes/france_states_diff.py New script to diff FR repo records vs data.gouv.fr department/region feeds and report deltas/coverage.
.github/fixes-docs/FIX_1352_PR_B_SUMMARY.md Documents scope, rationale, and validation results for PR-B of issue #1352.
Comments suppressed due to low confidence (1)

contributions/states/states.json:181883

  • The French Guiana (iso2=973) state record still has timezone set to Europe/Paris, but French Guiana’s IANA timezone is America/Cayenne (also reflected in contributions/countries/countries.json for country_code=GF). Since this PR already touches this record, consider correcting the timezone here (and potentially auditing the other overseas regions’ timezones in a follow-up).

dr5hn added a commit that referenced this pull request Apr 27, 2026
…hy (#1489)

Customer-facing follow-up to #1349 (Italy) and #1352 (France). Cities
were re-parented onto departments (FR) and provinces (IT) by #1395 /
#1394 / #1393 / #1400 / #1484, but the state records themselves still
carried inconsistent 'level' values, blocking downstream filters like
"all departments == level=2" or "all regions == level=1".

bin/scripts/fixes/states_level_normalise.py drives the change:
  - FR: 29 region-tier rows None -> 1 (13 metro regions, 3 special
        metro collectivities incl. Corse + Alsace + Métropole de Lyon,
        13 overseas regions/collectivities/territories/dependency).
        95 metropolitan departments unchanged at level=2.
  - IT: 103 rows updated. Final state: 20 at level=1
        (15 region + 5 autonomous region) and 106 at level=2
        (80 province + 14 metropolitan city + 6 free municipal
        consortium + 4 decentralized regional entity + 2 autonomous
        province).

Only the 'level' field is touched; idempotent on re-run; non-FR/IT
states untouched.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants