docs: multi-level territories policy (FR overseas, dual representation) (#1352 PR-C)#1392
Conversation
Weekly data-quality review (2026-04-27)Verdict: needs-discussion Checks
Concerns
🤖 Automated weekly review — Claude (sonnet-4-6). Generated by Claude Code |
Adds Danish postcodes via DAWA (Danmarks Adressers Web API) — public
sector data published under CC-0 by SDFI/Dataforsyningen.
1. bin/scripts/sync/import_denmark_postcodes.py — pipeline that fetches
/kommuner to build a kommune-code -> region-name map, then resolves
each /postnumre record's region via its first kommune. Maps the 5
Danish region names to states.json iso2 codes:
Region Hovedstaden -> 84 (called "Denmark" in states.json)
Region Sjælland -> 85 (Zealand)
Region Syddanmark -> 83 (Southern Denmark)
Region Midtjylland -> 82 (Central Denmark)
Region Nordjylland -> 81 (North Denmark)
2. contributions/postcodes/DK.json — 1,089 codes covering all 5 regions
with 100% state_id + 100% coordinate resolution.
Validation (zero errors)
- All codes match countries.postal_code_regex (^(\\d{4})\$)
- All FKs resolve, all state_codes agree with state.iso2
License & attribution
- Source: SDFI / Dataforsyningen DAWA (CC-0)
- Each row: source: "dawa"
Refs: #1039
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Icelandic postcodes via the sveinbjornt/iceaddr Python package which embeds the canonical postcode metadata under MIT licence. 1. bin/scripts/sync/import_iceland_postcodes.py — pipeline that dynamically imports the iceaddr POSTCODES dict and resolves each code's region via prefix range to states.json iso2 1-8 (Statistics Iceland's NUTS-3 boundaries: 1xx-2xx Capital, 3xx Western, 4xx Westfjords, 5xx Northwestern, 6xx Northeastern, 7xx Eastern, 8xx-9xx Southern). 2. contributions/postcodes/IS.json — 195 records with 100% state_id resolution. Locality names combine stadur_nf + lysing (e.g. "Reykjavík, Miðborg"). License & attribution - Source: iceaddr (MIT) embedding Pósturinn data - Each row: source: "iceaddr" Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…irrors (#1039) Bundles three small-to-medium European countries with confirmed redistributable postcode mirrors into a single batch importer. 1. bin/scripts/sync/import_eu_batch1_postcodes.py — pipeline that ingests three different shapes (SK JSON, RO CSV, SI CSV) and writes per-country JSON files. ASCII-folding + dash-to-space normalisation handles the Romanian Caraș-Severin / Bistrița-Năsăud cases where the CSV uses spaces and states.json uses hyphens. 2. contributions/postcodes/SK.json — 1,312 records (100% state via KRAJ -> states.iso2 direct match) 3. contributions/postcodes/RO.json — 13,751 records (100% state via ASCII-folded judet name match; all 6 Bucharest sectors mapped to 'B') 4. contributions/postcodes/SI.json — 522 records, country-only by design (source has no municipality info; SI postcodes don't map cleanly to administrative regions) Validation (zero errors) - All codes match countries.postal_code_regex - All FKs resolve, all state_codes agree with state.iso2 License & attribution - SK source: github.com/FeroVolar/PSC-JSON (community Slovenská pošta data) - RO source: github.com/alexionegit/coduripostaleRomaniaPS - SI source: github.com/dlabs/postcode_si (community Posta Slovenije data) Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#1397/#1399) The remap is a behavior change for downstream consumers — region-level state_code queries (e.g. Sicily=82, Lombardy=25) now return empty arrays because cities live under provinces/metropolitan cities, not regions. Documents the traversal pattern (states.parent_id) needed for region-aggregate queries so users know how to migrate.
…n) (#1352 PR-C) Adds MULTI_LEVEL_TERRITORIES.md documenting why 12 French overseas territories (and analogous US/CN/NO entities) appear simultaneously as ISO 3166-1 countries and as ISO 3166-2 subdivisions of their parent state. Captures the maintainer's Option C decision on #1352: keep both representations because (1) downstream API/SDK consumers filter on country_code, (2) ISO 3166-1 lists them as countries, and (3) the breaking change is unjustified for a labelling concern. Cross-links the new policy doc from .claude/CLAUDE.md (Important Rules) and README.md (contributing section). No data changes. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c57a9db to
a30f8ea
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new top-level documentation page describing the repository policy for “multi-level territories” (dual ISO 3166-1 country + ISO 3166-2 subdivision representation), and links it from existing maintainer/contributor docs.
Changes:
- Add
MULTI_LEVEL_TERRITORIES.mddocumenting the Option C policy for dual-modeled territories (with rationale and suggested consumer query patterns). - Link the new policy doc from
README.md. - Add a cross-reference in
.claude/CLAUDE.mdto guide maintainers before changing country/state records for dual-ISO territories.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
README.md |
Adds a link to the new multi-level territories policy doc in the contributing section. |
MULTI_LEVEL_TERRITORIES.md |
Introduces the policy document explaining dual representations and downstream querying guidance. |
.claude/CLAUDE.md |
Adds a maintainer-facing reminder/link to consult the new policy before modifying dual-ISO territory records. |
| 1. There is a row in `contributions/countries/countries.json` (with its own `id`, `iso2`, `iso3`). | ||
| 2. There is a row in `contributions/states/states.json` whose `country_code` points at the **parent** state (e.g. `FR`, `US`, `CN`), and whose `iso2` / `state_code` matches the territory. | ||
| 3. Cities under the territory live in `contributions/cities/<TERRITORY_ISO2>.json` (e.g. `MQ.json`), and reference both their `country_id` (= the territory) and their `state_id` (= the territory-as-subdivision-of-parent). |
There was a problem hiding this comment.
The Policy bullets describe territory cities as referencing state_id of the territory-as-subdivision-of-parent (e.g., FR-level state row). In the current dataset, territory city files (e.g., contributions/cities/MQ.json) use state_id for the territory’s own internal subdivisions (whose states.country_code is MQ, not FR). Please reword these bullets to match the actual model (country row + separate parent-country state row, but cities still point to the territory’s own states), or explicitly describe what would need to change in the data model for the stated state_id linkage to be true.
| 1. There is a row in `contributions/countries/countries.json` (with its own `id`, `iso2`, `iso3`). | |
| 2. There is a row in `contributions/states/states.json` whose `country_code` points at the **parent** state (e.g. `FR`, `US`, `CN`), and whose `iso2` / `state_code` matches the territory. | |
| 3. Cities under the territory live in `contributions/cities/<TERRITORY_ISO2>.json` (e.g. `MQ.json`), and reference both their `country_id` (= the territory) and their `state_id` (= the territory-as-subdivision-of-parent). | |
| 1. There is a row in `contributions/countries/countries.json` for the territory itself (with its own `id`, `iso2`, `iso3`). | |
| 2. There is also a separate row in `contributions/states/states.json` representing the territory as a subdivision of its **parent** country (e.g. `FR`, `US`, `CN`), with `country_code` set to the parent and `iso2` / `state_code` matching the territory. | |
| 3. Cities under the territory live in `contributions/cities/<TERRITORY_ISO2>.json` (e.g. `MQ.json`), with `country_id` pointing to the territory country row. In the current data model, their `state_id` points to the territory's own internal subdivisions in `contributions/states/states.json` (where `states.country_code` is the territory ISO2, e.g. `MQ`), not to the parent-country subdivision row described in item 2. |
| | `MF` | `FR-MF` | Saint-Martin (French part) | 190 | 4809 | overseas collectivity | | ||
| | `WF` | `FR-WF` | Wallis and Futuna | 243 | 4810 | overseas collectivity | | ||
|
|
||
| > The five **DROM** (Départements et régions d'outre-mer) — `GF`, `GP`, `MQ`, `RE`, `YT` — currently use **INSEE numeric codes** (`971`–`976`) as their `state_code` in `states.json`, while the overseas collectivities use the ISO 3166-2 alphabetic codes. Aligning the DROM to ISO 3166-2 alphabetic codes is tracked separately and is **out of scope** for this policy doc. |
There was a problem hiding this comment.
This note says the DROM use INSEE numeric codes as their state_code in states.json, but states.json does not contain a state_code field (it uses iso2 / iso3166_2). Consider updating the wording to reference the actual fields (states.iso2 and the corresponding cities.state_code values) so maintainers don’t look for a non-existent key.
| | ISO 3166-1 | ISO 3166-2 / INSEE | Name (English) | `countries.id` | `states.id` | State `type` | | ||
| | :--------- | :----------------- | :----------------------------------- | -------------: | ----------: | :------------------------------------------ | | ||
| | `GF` | `FR-GF` / `973` | French Guiana | 76 | 4822 | overseas region | | ||
| | `PF` | `FR-PF` | French Polynesia | 77 | 4824 | overseas collectivity | | ||
| | `TF` | `FR-TF` | French Southern and Antarctic Lands | 78 | 5065 | overseas territory | | ||
| | `GP` | `FR-GP` / `971` | Guadeloupe | 88 | 4829 | overseas region | | ||
| | `MQ` | `FR-MQ` / `972` | Martinique | 138 | 4827 | overseas region | | ||
| | `YT` | `FR-YT` / `976` | Mayotte | 141 | 4797 | overseas region | | ||
| | `NC` | `FR-NC` | New Caledonia | 157 | 5538 | overseas collectivity with special status | | ||
| | `RE` | `FR-RE` / `974` | Réunion | 180 | 4823 | overseas region | | ||
| | `PM` | `FR-PM` | Saint Pierre and Miquelon | 187 | 4821 | overseas collectivity | | ||
| | `BL` | `FR-BL` | Saint-Barthélemy | 189 | 4794 | overseas collectivity | | ||
| | `MF` | `FR-MF` | Saint-Martin (French part) | 190 | 4809 | overseas collectivity | | ||
| | `WF` | `FR-WF` | Wallis and Futuna | 243 | 4810 | overseas collectivity | |
There was a problem hiding this comment.
The “ISO 3166-2 / INSEE” column shows codes like FR-GF / 973, but the current states.iso3166_2 values for DROM in contributions/states/states.json are stored as FR-973, FR-972, etc. (numeric), not FR-GF, FR-MQ, etc. Please either align this table with the repository’s actual stored codes, or clearly label which value is the official ISO 3166-2 code vs. what the repo currently stores.
| | ISO 3166-1 | ISO 3166-2 / INSEE | Name (English) | `countries.id` | `states.id` | State `type` | | |
| | :--------- | :----------------- | :----------------------------------- | -------------: | ----------: | :------------------------------------------ | | |
| | `GF` | `FR-GF` / `973` | French Guiana | 76 | 4822 | overseas region | | |
| | `PF` | `FR-PF` | French Polynesia | 77 | 4824 | overseas collectivity | | |
| | `TF` | `FR-TF` | French Southern and Antarctic Lands | 78 | 5065 | overseas territory | | |
| | `GP` | `FR-GP` / `971` | Guadeloupe | 88 | 4829 | overseas region | | |
| | `MQ` | `FR-MQ` / `972` | Martinique | 138 | 4827 | overseas region | | |
| | `YT` | `FR-YT` / `976` | Mayotte | 141 | 4797 | overseas region | | |
| | `NC` | `FR-NC` | New Caledonia | 157 | 5538 | overseas collectivity with special status | | |
| | `RE` | `FR-RE` / `974` | Réunion | 180 | 4823 | overseas region | | |
| | `PM` | `FR-PM` | Saint Pierre and Miquelon | 187 | 4821 | overseas collectivity | | |
| | `BL` | `FR-BL` | Saint-Barthélemy | 189 | 4794 | overseas collectivity | | |
| | `MF` | `FR-MF` | Saint-Martin (French part) | 190 | 4809 | overseas collectivity | | |
| | `WF` | `FR-WF` | Wallis and Futuna | 243 | 4810 | overseas collectivity | | |
| | ISO 3166-1 | Official ISO 3166-2 | Stored `states.iso3166_2` | INSEE | Name (English) | `countries.id` | `states.id` | State `type` | | |
| | :--------- | :------------------ | :------------------------ | :---- | :----------------------------------- | -------------: | ----------: | :------------------------------------------ | | |
| | `GF` | `FR-GF` | `FR-973` | `973` | French Guiana | 76 | 4822 | overseas region | | |
| | `PF` | `FR-PF` | `FR-PF` | — | French Polynesia | 77 | 4824 | overseas collectivity | | |
| | `TF` | `FR-TF` | `FR-TF` | — | French Southern and Antarctic Lands | 78 | 5065 | overseas territory | | |
| | `GP` | `FR-GP` | `FR-971` | `971` | Guadeloupe | 88 | 4829 | overseas region | | |
| | `MQ` | `FR-MQ` | `FR-972` | `972` | Martinique | 138 | 4827 | overseas region | | |
| | `YT` | `FR-YT` | `FR-976` | `976` | Mayotte | 141 | 4797 | overseas region | | |
| | `NC` | `FR-NC` | `FR-NC` | — | New Caledonia | 157 | 5538 | overseas collectivity with special status | | |
| | `RE` | `FR-RE` | `FR-974` | `974` | Réunion | 180 | 4823 | overseas region | | |
| | `PM` | `FR-PM` | `FR-PM` | — | Saint Pierre and Miquelon | 187 | 4821 | overseas collectivity | | |
| | `BL` | `FR-BL` | `FR-BL` | — | Saint-Barthélemy | 189 | 4794 | overseas collectivity | | |
| | `MF` | `FR-MF` | `FR-MF` | — | Saint-Martin (French part) | 190 | 4809 | overseas collectivity | | |
| | `WF` | `FR-WF` | `FR-WF` | — | Wallis and Futuna | 243 | 4810 | overseas collectivity | |
| Use the `FR` country, then traverse via `state_id`: | ||
|
|
||
| ```sql | ||
| SELECT c.* | ||
| FROM cities c | ||
| JOIN states s ON c.state_id = s.id | ||
| WHERE s.country_code = 'FR'; -- includes all 12 overseas territories | ||
| ``` | ||
|
|
||
| This works because every overseas territory has a state row whose `country_code = 'FR'`. |
There was a problem hiding this comment.
The “everything in the French Republic” SQL example won’t include overseas-territory cities with the current data model: for territory city files (e.g., MQ.json, GP.json), cities.state_id points to a state whose states.country_code is the territory (e.g., MQ), not FR, so WHERE s.country_code = 'FR' filters them out. Consider revising this section to a query pattern that matches current exports (e.g., cities.country_code IN ('FR', ...territory codes...), or another explicit mapping approach).
| ### "Give me metropolitan France only" (exclude overseas) | ||
|
|
||
| Exclude the 12 overseas codes explicitly. The metropolitan vs. overseas split is a political/administrative distinction, not a data-model distinction: | ||
|
|
||
| ```sql | ||
| SELECT * FROM cities | ||
| WHERE country_code = 'FR' | ||
| AND state_code NOT IN ('GF','PF','TF','GP','MQ','YT','NC','RE','PM','BL','MF','WF', | ||
| '971','972','973','974','976'); -- INSEE for DROM | ||
| ``` |
There was a problem hiding this comment.
The “metropolitan France only” query uses country_code = 'FR' and then excludes overseas codes via state_code NOT IN (...). Given territory cities use country_code equal to the territory ISO2 (e.g., MQ, GP) rather than FR, country_code = 'FR' already excludes them, making the state_code exclusion redundant/misleading. Either simplify this query to just filter country_code = 'FR', or (if you intend to start from the earlier JOIN states ... WHERE s.country_code='FR' pattern) show the correct exclusion at the joined states level.
Summary
Adds
MULTI_LEVEL_TERRITORIES.mdat the repo root, documenting the policy decision on issue #1352 (Option C — keep both representations):GF,PF,TF,GP,MQ,YT,NC,PM,BL,MF,WF,RE) appear simultaneously as ISO 3166-1 countries and as ISO 3166-2 subdivisions ofFR. Both representations are kept; neither is canonical.country_codefiltering, locale/timezone metadata, reversibility).Cross-references added in:
.claude/CLAUDE.md— one-line bullet under Important Rules → DOREADME.md— one-line link in the contributing/full-guide rowThis is PR-C of a 4-PR Option-C plan for #1352. Pure documentation: no data changes.
Test plan
MULTI_LEVEL_TERRITORIES.md↔.claude/CLAUDE.md↔README.mdgit diff --stat HEAD~1confirmscontributions/,json/,csv/,xml/,yml/,sql/,sqlite/,mongodb/, etc. were touchedRefs: #1352
🤖 Generated with Claude Code