Skip to content

feat(FR-overseas): populate missing city files for GF, BL, MF, PM, TF (#1352 PR-D)#1400

Merged
dr5hn merged 3 commits into
masterfrom
feat/issue-1352-overseas-city-files
Apr 27, 2026
Merged

feat(FR-overseas): populate missing city files for GF, BL, MF, PM, TF (#1352 PR-D)#1400
dr5hn merged 3 commits into
masterfrom
feat/issue-1352-overseas-city-files

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented Apr 25, 2026

Summary

PR-D of the 4-PR Option-C plan for #1352 (France data — missing cities and regions misclassified). Populates the 5 previously-missing French overseas-territory city files and adds the parent state records they need.

  • 9 new state records (ids 5815–5823) under each territory's own country (GF=76, BL=189, MF=190, PM=187, TF=78) — matches the modelling pattern already used by MQ/GP/NC/PF/RE/YT/WF
  • 31 new city records across GF.json (22 communes), BL.json (1), MF.json (1), PM.json (2), TF.json (5)

Scope expansion vs. original brief

PR-D was originally scoped to "cities only — don't touch states.json (that's PR-B)". After investigation, PR-B is polish-only (no add/delete of state records), so the parent states for these 5 territories were never going to be added by any sibling PR. Cities require a real state_id to pass cross-reference validation, so PR-D was expanded to include the minimal state records its cities depend on. Each new state lives under its own overseas country_id — not under FR. Full reasoning in .github/fixes-docs/FIX_1352_PR_D_SUMMARY.md.

Per-territory model

Territory Country id States Cities Notes
GF 76 1 (Guyane, single overseas region) 22 communes All 22 communes per data.gouv.fr / Wikidata
BL 189 1 (collectivity) 1 (Gustavia) Single-commune territory
MF 190 1 (collectivity) 1 (Marigot) Single-commune territory
PM 187 1 (collectivity) 2 (Saint-Pierre, Miquelon-Langlade) Mirrors BL/MF model since no PM precedent existed
TF 78 5 (one per district) 5 (research stations) Uninhabited — "cities" are principal research stations / weather bases

Sources

  • GF communes — Wikidata SPARQL (P31=Q484170P131*=Q3769), cross-referenced with Wikipedia: Communes of French Guiana
  • BL/MF/PM/TF — Wikipedia articles for each commune / station; Q-IDs and decimal coordinates from Wikidata
  • Country/state IDs — verified against contributions/countries/countries.json; new state ids 5815–5823 (max prior = 5814) are non-conflicting

Validator notes

The repo's PR validator runs continue-on-error: true (advisory). Local validation reports:

  • 0 cross-reference errors — all 31 cities resolve their state_id to existing states; country_id/state_code chains match
  • 9 schema errors — the auto-managed id field on the new state records, intentional (cities can't reference state_ids that don't exist; bin/scripts/sync/normalize_json.py does this assignment automatically when MySQL is available)
  • 67 warnings — unknown fields (iso3166_2/timezone/translations/population on states; native on cities) that are present on every existing FR-overseas record but aren't in the validator's optional whitelist
  • Coordinate-bounds validator — skips all 5 (none are in .github/data/country-bounds.json)

Out of scope

  • FR.json city additions — PR-A
  • FR mainland states polish — PR-B
  • FR / FR-overseas modelling docs — PR-C

Test plan

  • Schema validator passes / has only documented warnings
  • Cross-reference validator passes for all 31 city records
  • Visual spot-check of coordinates (e.g. Cayenne 4.94°N -52.34°W, Port-aux-Français -49.35°S 70.22°E)
  • Round-trip: python3 bin/scripts/sync/import_json_to_mysql.py then python3 bin/scripts/sync/sync_mysql_to_json.py produces no diff
  • No duplicate cities created in territories that previously had files

Refs #1352 — does not close.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Owner Author

dr5hn commented Apr 27, 2026

Weekly data-quality review (2026-04-27)

Verdict: needs-fix

Checks

  • Schema: ❌ New state records in contributions/states/states.json include explicit id fields (5815–5823). CLAUDE.md §Important Rules: "Omit id for new records (auto-assigned)." Pre-assigning IDs bypasses MySQL AUTO_INCREMENT and may leave the counter out of sync, risking future ID collisions. Needs explicit maintainer sign-off or resolution via normalize_json.py against a live MySQL instance.
  • FK integrity: ✅ All 31 new city records resolve state_id to newly-added states in the same PR. country_id/country_code chains correct: GF=76, BL=189, MF=190, PM=187, TF=78.
  • Coordinates: ✅ GF/BL/MF/PM/TF absent from .github/data/country-bounds.json — confirmed, coordinate validator skips all five territories. Spot-checked geography: Cayenne 4.94°N −52.34°W ✅; Port-aux-Français −49.35°S 70.22°E ✅; Gustavia 17.90°N −62.85°W ✅; Tromelin −15.89°S 54.52°E ✅; Dumont d'Urville −66.66°S 140.00°E ✅.
  • Wikidata: N/A — Wikidata API returned 403 on all requests during this review (network failure, treated as N/A per review protocol).
  • Naming convention: ❌ contributions/cities/GF.json — the record for Montsinéry-Tonnegrande has "native": "Montsinéry-Tonnégrande" (spurious acute on the second 'e') while the name field and the INSEE canonical spelling are both "Montsinéry-Tonnegrande" (no accent). Fix: align native to match the name field.

Specific concerns (blocking)

  1. Pre-assigned id on new state records (contributions/states/states.json, ids 5815–5823) — Schema deviation from CLAUDE.md. The PR rationale (FK dependency, no MySQL available) is understandable, but the risk of an AUTO_INCREMENT counter gap must be addressed before merge. Options: (a) run normalize_json.py after a local MySQL round-trip so MySQL assigns the IDs, or (b) maintainer explicitly accepts the pre-assigned IDs and ensures ALTER TABLE states AUTO_INCREMENT = 5824; is run at import time.

  2. Montsinéry-Tonnegrande native typo (contributions/cities/GF.json) — Change "native": "Montsinéry-Tonnégrande""native": "Montsinéry-Tonnegrande" (drop accent on second 'e'). INSEE canonical spelling confirmed without the accent.

  3. Cross-PR FIX_1039 postal conflictcontributions/countries/countries.json (+25/−25) and .github/fixes-docs/FIX_1039_SUMMARY.md are identical to those in PRs docs: multi-level territories policy (FR overseas, dual representation) (#1352 PR-C) #1392, feat(FR): polish department + region metadata vs data.gouv.fr (#1352 PR-B) #1393, and feat(FR): diff cities against data.gouv.fr, add missing metropolitan communes (#1352 PR-A) #1394. The first to merge will succeed; subsequent merges will conflict on both files. The FIX_1039 changes should live in exactly one PR before any of the four are merged.

🤖 Automated weekly review — Claude (sonnet-4-6).


Generated by Claude Code

dr5hn and others added 3 commits April 27, 2026 21:10
Renumbered IDs from 5815-5823 to 5818-5826 to avoid collision with
US Armed Forces military postal regions added to master after PR-D
was authored.

Refs: #1352

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the 5 previously-missing FR-overseas city files:

- contributions/cities/GF.json — 22 communes of French Guiana
- contributions/cities/BL.json — Gustavia (capital, single-commune)
- contributions/cities/MF.json — Marigot (capital, single-commune)
- contributions/cities/PM.json — Saint-Pierre, Miquelon-Langlade
- contributions/cities/TF.json — 5 research stations (Alfred Faure,
  Dumont d'Urville Station, Martin-de-Viviès, Port-aux-Français,
  Tromelin), one per TF district

Total: 31 city records, sorted alphabetically per file. Each record
carries name (English), native (French), state_id (referencing the
state records added in the prior commit), latitude/longitude (decimal
seven-place), timezone (IANA), and wikiDataId.

TF is uninhabited, so its 'cities' are the principal research station
or weather base of each district — the closest analogue to a populated
locality the territory has. Coordinates and Q-IDs come from each
station's Wikipedia article; the modelling rationale is documented in
FIX_1352_PR_D_SUMMARY.md.

Sources: Wikidata SPARQL (P31=Q484170 commune ∧ P131*=Q3769 for GF
communes), Wikipedia articles for the BL/MF/PM communes and TF
research stations, cross-checked against data.gouv.fr where available.

Refs #1352 (this is PR-D of the 4-PR Option-C plan)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-5826)

After bumping FR-overseas state ids from 5815-5823 to 5818-5826 in the
prior commit (collision with US AF postal regions on master), update
each city's state_id to point at the renumbered state.

Cross-reference validation: 0 errors across 31 cities.

Refs: #1352

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dr5hn dr5hn marked this pull request as ready for review April 27, 2026 15:40
@dr5hn dr5hn force-pushed the feat/issue-1352-overseas-city-files branch from 75593ab to 05a9507 Compare April 27, 2026 15:40
Copilot AI review requested due to automatic review settings April 27, 2026 15:41
@dr5hn dr5hn merged commit b61e4de into master Apr 27, 2026
2 checks passed
@dr5hn dr5hn deleted the feat/issue-1352-overseas-city-files branch April 27, 2026 15:41
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 27, 2026
@dosubot dosubot Bot added the enhancement New feature or request label Apr 27, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing city contribution files for several French overseas territories and introduces the state records required for those city state_id references, as part of the broader fix for #1352.

Changes:

  • Added 5 new country city files: GF.json, BL.json, MF.json, PM.json, TF.json.
  • Appended 9 new overseas-territory state records into contributions/states/states.json to support cross-references.
  • Added a PR-D summary document under .github/fixes-docs/.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
contributions/states/states.json Adds new state records for GF/BL/MF/PM and TF districts so new cities can reference valid state_ids.
contributions/cities/GF.json Adds French Guiana communes as cities under the new GF state.
contributions/cities/BL.json Adds Gustavia as the single BL city under the new BL state.
contributions/cities/MF.json Adds Marigot as the single MF city under the new MF state.
contributions/cities/PM.json Adds Saint-Pierre and Miquelon-Langlade under the new PM state.
contributions/cities/TF.json Adds key TF research stations as “cities” under TF district states.
.github/fixes-docs/FIX_1352_PR_D_SUMMARY.md Documents the modelling and sources for PR-D.

Comment on lines +223208 to +223212
"country_code": "BL",
"fips_code": null,
"iso2": "01",
"iso3166_2": "FR-BL",
"type": "overseas collectivity",
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso3166_2 is set to FR-BL while this record is under country_code BL. Elsewhere, overseas-territory state records use iso3166_2 prefixed by the territory code (e.g., GP-01 in contributions/states/states.json:204428–204431). For consistency, consider using a BL-.. style code (or null if BL has no subdivision codes) rather than an FR-.. code.

Copilot uses AI. Check for mistakes.
Comment on lines +223227 to +223231
"country_code": "MF",
"fips_code": null,
"iso2": "01",
"iso3166_2": "FR-MF",
"type": "overseas collectivity",
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso3166_2 is set to FR-MF while this record is under country_code MF. Existing overseas-territory state records use iso3166_2 prefixed by the territory code (e.g., RE-01 in contributions/states/states.json:206611–206614). Consider aligning MF to the same convention (e.g., MF-01) or using null if no subdivision code exists.

Copilot uses AI. Check for mistakes.
Comment on lines +223246 to +223250
"country_code": "PM",
"fips_code": null,
"iso2": "01",
"iso3166_2": "FR-PM",
"type": "overseas collectivity",
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso3166_2 is set to FR-PM while this record is under country_code PM. For other overseas-territory country codes, iso3166_2 follows the territory prefix (e.g., NC-01 in contributions/states/states.json:197918–197920). Consider using a PM-.. style code (or null) to keep the format consistent across these overseas territories.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +27
| 5815 | Guyane | GF (76) | 01 | FR-973 | overseas region | Q3769 |
| 5816 | Saint-Barthélemy | BL (189)| 01 | FR-BL | overseas collectivity | Q25362 |
| 5817 | Saint-Martin | MF (190)| 01 | FR-MF | overseas collectivity | Q126125 |
| 5818 | Saint-Pierre and Miquelon | PM (187)| 01 | FR-PM | overseas collectivity | Q34617 |
| 5819 | Adélie Land | TF (78) | 01 | — | district | Q184319 |
| 5820 | Crozet Islands | TF (78) | 02 | — | district | Q186940 |
| 5821 | Kerguelen Islands | TF (78) | 03 | — | district | Q133888 |
| 5822 | Saint-Paul and Amsterdam Islands | TF (78) | 04 | — | district | Q1149385 |
| 5823 | Scattered Islands | TF (78) | 05 | — | district | Q230589 |
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state-id table is out of sync with the actual IDs added in contributions/states/states.json. IDs 5815–5817 already exist in the file (US Armed Forces postal regions), while the new overseas-territory states added by this PR start at 5818 and TF districts run through 5826. Please update this table so it matches the real IDs referenced by the new city files.

Suggested change
| 5815 | Guyane | GF (76) | 01 | FR-973 | overseas region | Q3769 |
| 5816 | Saint-Barthélemy | BL (189)| 01 | FR-BL | overseas collectivity | Q25362 |
| 5817 | Saint-Martin | MF (190)| 01 | FR-MF | overseas collectivity | Q126125 |
| 5818 | Saint-Pierre and Miquelon | PM (187)| 01 | FR-PM | overseas collectivity | Q34617 |
| 5819 | Adélie Land | TF (78) | 01 || district | Q184319 |
| 5820 | Crozet Islands | TF (78) | 02 || district | Q186940 |
| 5821 | Kerguelen Islands | TF (78) | 03 || district | Q133888 |
| 5822 | Saint-Paul and Amsterdam Islands | TF (78) | 04 || district | Q1149385 |
| 5823 | Scattered Islands | TF (78) | 05 || district | Q230589 |
| 5818 | Guyane | GF (76) | 01 | FR-973 | overseas region | Q3769 |
| 5819 | Saint-Barthélemy | BL (189)| 01 | FR-BL | overseas collectivity | Q25362 |
| 5820 | Saint-Martin | MF (190)| 01 | FR-MF | overseas collectivity | Q126125 |
| 5821 | Saint-Pierre and Miquelon | PM (187)| 01 | FR-PM | overseas collectivity | Q34617 |
| 5822 | Adélie Land | TF (78) | 01 || district | Q184319 |
| 5823 | Crozet Islands | TF (78) | 02 || district | Q186940 |
| 5824 | Kerguelen Islands | TF (78) | 03 || district | Q133888 |
| 5825 | Saint-Paul and Amsterdam Islands | TF (78) | 04 || district | Q1149385 |
| 5826 | Scattered Islands | TF (78) | 05 || district | Q230589 |

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +72
- **Country/state IDs** — verified against `contributions/countries/countries.json` and existing `contributions/states/states.json` (max prior id = 5814; new ids 5815–5823 are non-conflicting).

## Validator implications

The repo's PR validator runs as `continue-on-error: true` (advisory only — see `.github/workflows/pr-validator.yml`), so the items below do not block merging.

- **Schema validator**: the 9 new state records include `id` (5815–5823) which `validate-schema.js` flags as auto-managed-only. This is intentional — without pre-assigned state IDs, the 31 cities cannot reference their parent states and `validate-cross-reference.js` would error 31 times instead. The contributor convention (per `bin/scripts/sync/normalize_json.py`) pre-assigns sequential IDs to new state records so cross-refs resolve cleanly; we did that manually since this worktree has no MySQL instance.
- **Cross-reference validator**: all 31 cities reference state IDs 5815–5818 (singular states for GF/BL/MF/PM) or 5819–5823 (TF districts), all of which exist in the same PR. `state_code` matches each state's `iso2` per the existing FR-overseas convention.
- **Coordinate-bounds validator**: GF, BL, MF, PM, TF have **no entries** in `.github/data/country-bounds.json`, so this validator skips them entirely — no warnings expected.
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section claims the prior max state id was 5814 and that the PR adds ids 5815–5823, but contributions/states/states.json already contains ids 5815–5817 and this PR’s new TF district ids go up to 5826. Please correct these ranges so readers don’t assume the IDs are non-conflicting when they actually overlap existing records.

Copilot uses AI. Check for mistakes.
Comment on lines +223189 to +223193
"country_code": "GF",
"fips_code": null,
"iso2": "01",
"iso3166_2": "FR-973",
"type": "overseas region",
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso3166_2 here uses an FR-... code even though the record’s country_code is GF. In the existing overseas-territory modelling, iso3166_2 is consistently prefixed with the territory’s own ISO2 (e.g., GP-01 in contributions/states/states.json:204428–204431, RE-01 in contributions/states/states.json:206611–206614). Consider switching this to the same pattern (e.g., GF-01) or setting it to null if there is no suitable subdivision code for GF.

Copilot uses AI. Check for mistakes.
dr5hn added a commit that referenced this pull request Apr 27, 2026
…hy (#1489)

Customer-facing follow-up to #1349 (Italy) and #1352 (France). Cities
were re-parented onto departments (FR) and provinces (IT) by #1395 /
#1394 / #1393 / #1400 / #1484, but the state records themselves still
carried inconsistent 'level' values, blocking downstream filters like
"all departments == level=2" or "all regions == level=1".

bin/scripts/fixes/states_level_normalise.py drives the change:
  - FR: 29 region-tier rows None -> 1 (13 metro regions, 3 special
        metro collectivities incl. Corse + Alsace + Métropole de Lyon,
        13 overseas regions/collectivities/territories/dependency).
        95 metropolitan departments unchanged at level=2.
  - IT: 103 rows updated. Final state: 20 at level=1
        (15 region + 5 autonomous region) and 106 at level=2
        (80 province + 14 metropolitan city + 6 free municipal
        consortium + 4 decentralized regional entity + 2 autonomous
        province).

Only the 'level' field is touched; idempotent on re-run; non-FR/IT
states untouched.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants