feat(FR-overseas): populate missing city files for GF, BL, MF, PM, TF (#1352 PR-D)#1400
Conversation
Weekly data-quality review (2026-04-27)Verdict: needs-fix Checks
Specific concerns (blocking)
🤖 Automated weekly review — Claude (sonnet-4-6). Generated by Claude Code |
Renumbered IDs from 5815-5823 to 5818-5826 to avoid collision with US Armed Forces military postal regions added to master after PR-D was authored. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the 5 previously-missing FR-overseas city files: - contributions/cities/GF.json — 22 communes of French Guiana - contributions/cities/BL.json — Gustavia (capital, single-commune) - contributions/cities/MF.json — Marigot (capital, single-commune) - contributions/cities/PM.json — Saint-Pierre, Miquelon-Langlade - contributions/cities/TF.json — 5 research stations (Alfred Faure, Dumont d'Urville Station, Martin-de-Viviès, Port-aux-Français, Tromelin), one per TF district Total: 31 city records, sorted alphabetically per file. Each record carries name (English), native (French), state_id (referencing the state records added in the prior commit), latitude/longitude (decimal seven-place), timezone (IANA), and wikiDataId. TF is uninhabited, so its 'cities' are the principal research station or weather base of each district — the closest analogue to a populated locality the territory has. Coordinates and Q-IDs come from each station's Wikipedia article; the modelling rationale is documented in FIX_1352_PR_D_SUMMARY.md. Sources: Wikidata SPARQL (P31=Q484170 commune ∧ P131*=Q3769 for GF communes), Wikipedia articles for the BL/MF/PM communes and TF research stations, cross-checked against data.gouv.fr where available. Refs #1352 (this is PR-D of the 4-PR Option-C plan) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-5826) After bumping FR-overseas state ids from 5815-5823 to 5818-5826 in the prior commit (collision with US AF postal regions on master), update each city's state_id to point at the renumbered state. Cross-reference validation: 0 errors across 31 cities. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
75593ab to
05a9507
Compare
There was a problem hiding this comment.
Pull request overview
Adds missing city contribution files for several French overseas territories and introduces the state records required for those city state_id references, as part of the broader fix for #1352.
Changes:
- Added 5 new country city files:
GF.json,BL.json,MF.json,PM.json,TF.json. - Appended 9 new overseas-territory state records into
contributions/states/states.jsonto support cross-references. - Added a PR-D summary document under
.github/fixes-docs/.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| contributions/states/states.json | Adds new state records for GF/BL/MF/PM and TF districts so new cities can reference valid state_ids. |
| contributions/cities/GF.json | Adds French Guiana communes as cities under the new GF state. |
| contributions/cities/BL.json | Adds Gustavia as the single BL city under the new BL state. |
| contributions/cities/MF.json | Adds Marigot as the single MF city under the new MF state. |
| contributions/cities/PM.json | Adds Saint-Pierre and Miquelon-Langlade under the new PM state. |
| contributions/cities/TF.json | Adds key TF research stations as “cities” under TF district states. |
| .github/fixes-docs/FIX_1352_PR_D_SUMMARY.md | Documents the modelling and sources for PR-D. |
| "country_code": "BL", | ||
| "fips_code": null, | ||
| "iso2": "01", | ||
| "iso3166_2": "FR-BL", | ||
| "type": "overseas collectivity", |
There was a problem hiding this comment.
iso3166_2 is set to FR-BL while this record is under country_code BL. Elsewhere, overseas-territory state records use iso3166_2 prefixed by the territory code (e.g., GP-01 in contributions/states/states.json:204428–204431). For consistency, consider using a BL-.. style code (or null if BL has no subdivision codes) rather than an FR-.. code.
| "country_code": "MF", | ||
| "fips_code": null, | ||
| "iso2": "01", | ||
| "iso3166_2": "FR-MF", | ||
| "type": "overseas collectivity", |
There was a problem hiding this comment.
iso3166_2 is set to FR-MF while this record is under country_code MF. Existing overseas-territory state records use iso3166_2 prefixed by the territory code (e.g., RE-01 in contributions/states/states.json:206611–206614). Consider aligning MF to the same convention (e.g., MF-01) or using null if no subdivision code exists.
| "country_code": "PM", | ||
| "fips_code": null, | ||
| "iso2": "01", | ||
| "iso3166_2": "FR-PM", | ||
| "type": "overseas collectivity", |
There was a problem hiding this comment.
iso3166_2 is set to FR-PM while this record is under country_code PM. For other overseas-territory country codes, iso3166_2 follows the territory prefix (e.g., NC-01 in contributions/states/states.json:197918–197920). Consider using a PM-.. style code (or null) to keep the format consistent across these overseas territories.
| | 5815 | Guyane | GF (76) | 01 | FR-973 | overseas region | Q3769 | | ||
| | 5816 | Saint-Barthélemy | BL (189)| 01 | FR-BL | overseas collectivity | Q25362 | | ||
| | 5817 | Saint-Martin | MF (190)| 01 | FR-MF | overseas collectivity | Q126125 | | ||
| | 5818 | Saint-Pierre and Miquelon | PM (187)| 01 | FR-PM | overseas collectivity | Q34617 | | ||
| | 5819 | Adélie Land | TF (78) | 01 | — | district | Q184319 | | ||
| | 5820 | Crozet Islands | TF (78) | 02 | — | district | Q186940 | | ||
| | 5821 | Kerguelen Islands | TF (78) | 03 | — | district | Q133888 | | ||
| | 5822 | Saint-Paul and Amsterdam Islands | TF (78) | 04 | — | district | Q1149385 | | ||
| | 5823 | Scattered Islands | TF (78) | 05 | — | district | Q230589 | |
There was a problem hiding this comment.
The state-id table is out of sync with the actual IDs added in contributions/states/states.json. IDs 5815–5817 already exist in the file (US Armed Forces postal regions), while the new overseas-territory states added by this PR start at 5818 and TF districts run through 5826. Please update this table so it matches the real IDs referenced by the new city files.
| | 5815 | Guyane | GF (76) | 01 | FR-973 | overseas region | Q3769 | | |
| | 5816 | Saint-Barthélemy | BL (189)| 01 | FR-BL | overseas collectivity | Q25362 | | |
| | 5817 | Saint-Martin | MF (190)| 01 | FR-MF | overseas collectivity | Q126125 | | |
| | 5818 | Saint-Pierre and Miquelon | PM (187)| 01 | FR-PM | overseas collectivity | Q34617 | | |
| | 5819 | Adélie Land | TF (78) | 01 | — | district | Q184319 | | |
| | 5820 | Crozet Islands | TF (78) | 02 | — | district | Q186940 | | |
| | 5821 | Kerguelen Islands | TF (78) | 03 | — | district | Q133888 | | |
| | 5822 | Saint-Paul and Amsterdam Islands | TF (78) | 04 | — | district | Q1149385 | | |
| | 5823 | Scattered Islands | TF (78) | 05 | — | district | Q230589 | | |
| | 5818 | Guyane | GF (76) | 01 | FR-973 | overseas region | Q3769 | | |
| | 5819 | Saint-Barthélemy | BL (189)| 01 | FR-BL | overseas collectivity | Q25362 | | |
| | 5820 | Saint-Martin | MF (190)| 01 | FR-MF | overseas collectivity | Q126125 | | |
| | 5821 | Saint-Pierre and Miquelon | PM (187)| 01 | FR-PM | overseas collectivity | Q34617 | | |
| | 5822 | Adélie Land | TF (78) | 01 | — | district | Q184319 | | |
| | 5823 | Crozet Islands | TF (78) | 02 | — | district | Q186940 | | |
| | 5824 | Kerguelen Islands | TF (78) | 03 | — | district | Q133888 | | |
| | 5825 | Saint-Paul and Amsterdam Islands | TF (78) | 04 | — | district | Q1149385 | | |
| | 5826 | Scattered Islands | TF (78) | 05 | — | district | Q230589 | |
| - **Country/state IDs** — verified against `contributions/countries/countries.json` and existing `contributions/states/states.json` (max prior id = 5814; new ids 5815–5823 are non-conflicting). | ||
|
|
||
| ## Validator implications | ||
|
|
||
| The repo's PR validator runs as `continue-on-error: true` (advisory only — see `.github/workflows/pr-validator.yml`), so the items below do not block merging. | ||
|
|
||
| - **Schema validator**: the 9 new state records include `id` (5815–5823) which `validate-schema.js` flags as auto-managed-only. This is intentional — without pre-assigned state IDs, the 31 cities cannot reference their parent states and `validate-cross-reference.js` would error 31 times instead. The contributor convention (per `bin/scripts/sync/normalize_json.py`) pre-assigns sequential IDs to new state records so cross-refs resolve cleanly; we did that manually since this worktree has no MySQL instance. | ||
| - **Cross-reference validator**: all 31 cities reference state IDs 5815–5818 (singular states for GF/BL/MF/PM) or 5819–5823 (TF districts), all of which exist in the same PR. `state_code` matches each state's `iso2` per the existing FR-overseas convention. | ||
| - **Coordinate-bounds validator**: GF, BL, MF, PM, TF have **no entries** in `.github/data/country-bounds.json`, so this validator skips them entirely — no warnings expected. |
There was a problem hiding this comment.
This section claims the prior max state id was 5814 and that the PR adds ids 5815–5823, but contributions/states/states.json already contains ids 5815–5817 and this PR’s new TF district ids go up to 5826. Please correct these ranges so readers don’t assume the IDs are non-conflicting when they actually overlap existing records.
| "country_code": "GF", | ||
| "fips_code": null, | ||
| "iso2": "01", | ||
| "iso3166_2": "FR-973", | ||
| "type": "overseas region", |
There was a problem hiding this comment.
iso3166_2 here uses an FR-... code even though the record’s country_code is GF. In the existing overseas-territory modelling, iso3166_2 is consistently prefixed with the territory’s own ISO2 (e.g., GP-01 in contributions/states/states.json:204428–204431, RE-01 in contributions/states/states.json:206611–206614). Consider switching this to the same pattern (e.g., GF-01) or setting it to null if there is no suitable subdivision code for GF.
…hy (#1489) Customer-facing follow-up to #1349 (Italy) and #1352 (France). Cities were re-parented onto departments (FR) and provinces (IT) by #1395 / #1394 / #1393 / #1400 / #1484, but the state records themselves still carried inconsistent 'level' values, blocking downstream filters like "all departments == level=2" or "all regions == level=1". bin/scripts/fixes/states_level_normalise.py drives the change: - FR: 29 region-tier rows None -> 1 (13 metro regions, 3 special metro collectivities incl. Corse + Alsace + Métropole de Lyon, 13 overseas regions/collectivities/territories/dependency). 95 metropolitan departments unchanged at level=2. - IT: 103 rows updated. Final state: 20 at level=1 (15 region + 5 autonomous region) and 106 at level=2 (80 province + 14 metropolitan city + 6 free municipal consortium + 4 decentralized regional entity + 2 autonomous province). Only the 'level' field is touched; idempotent on re-run; non-FR/IT states untouched. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
PR-D of the 4-PR Option-C plan for #1352 (France data — missing cities and regions misclassified). Populates the 5 previously-missing French overseas-territory city files and adds the parent state records they need.
GF.json(22 communes),BL.json(1),MF.json(1),PM.json(2),TF.json(5)Scope expansion vs. original brief
PR-D was originally scoped to "cities only — don't touch states.json (that's PR-B)". After investigation, PR-B is polish-only (no add/delete of state records), so the parent states for these 5 territories were never going to be added by any sibling PR. Cities require a real
state_idto pass cross-reference validation, so PR-D was expanded to include the minimal state records its cities depend on. Each new state lives under its own overseas country_id — not under FR. Full reasoning in.github/fixes-docs/FIX_1352_PR_D_SUMMARY.md.Per-territory model
Sources
P31=Q484170∧P131*=Q3769), cross-referenced with Wikipedia: Communes of French Guianacontributions/countries/countries.json; new state ids 5815–5823 (max prior = 5814) are non-conflictingValidator notes
The repo's PR validator runs
continue-on-error: true(advisory). Local validation reports:state_idto existing states;country_id/state_codechains matchidfield on the new state records, intentional (cities can't reference state_ids that don't exist;bin/scripts/sync/normalize_json.pydoes this assignment automatically when MySQL is available)iso3166_2/timezone/translations/populationon states;nativeon cities) that are present on every existing FR-overseas record but aren't in the validator's optional whitelist.github/data/country-bounds.json)Out of scope
Test plan
python3 bin/scripts/sync/import_json_to_mysql.pythenpython3 bin/scripts/sync/sync_mysql_to_json.pyproduces no diffRefs #1352 — does not close.
🤖 Generated with Claude Code