fix(export): gzip postcode files + gitignore raw versions#1490
Merged
Conversation
Adds the official Luxembourg postcode dataset from CACLR (Centre des Adresses du Cadastre du Luxembourg) via data.public.lu, CC-Zero. Why --- Closes the LU gap on issue #1039. The CACLR registry is the canonical reference for Luxembourgish addresses, published by the LU government under public-domain CC-Zero. Coverage -------- - 4,491 unique (code, locality, canton) tuples / 100% state FK - All 12 CSC cantons covered Source pipeline --------------- 1. data.public.lu API resolves the latest caclr.xlsx URL (URL is date-stamped and rotates every refresh) 2. Importer parses the denormalised TR.DiCaCoLo.RuCp join sheet directly via openpyxl 3. SOURCE_TO_ISO2 maps 13 source canton labels to 12 CSC iso2 ('LUXEMBOURG-VILLE' capital sub-classification collapses to L) 4. 118 '?' postcodes (newly named streets without assigned codes) are filtered out License ------- CC-Zero (public domain). Each row carries `source: "caclr-data-public-lu"` for export-time provenance. Validation ---------- - python3 -m py_compile passes - 100% regex match (^(?:L-)?\d{4}$) - 100% state_id valid + state.country_id == 127 + state_code agrees - No auto-managed fields (id, created_at, updated_at, flag) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The export.yml workflow currently produces raw uncompressed postcode exports (json/postcodes.json 228MB, xml/postcodes.xml 250MB, yml/postcodes.yml 163MB, sqlite/postcodes.sqlite3 101MB, sqlserver/postcodes.sql 70MB, sql/postcodes.sql 86MB) that exceed GitHub's 100MB hard limit when peter-evans/create-pull-request tries to push them, breaking the export PR. Mirror the existing cities.* gzip+gitignore pattern for postcodes.*: - gzip -9 every generated postcodes.* file alongside cities.* - gitignore the raw uncompressed postcodes.* in all 7 directories so the export commit doesn't include them; only the .gz goes to the GitHub Release. Restores the export pipeline so today's data fixes (#1349, #1352: PR-A/B/C/D/E + leveling) can reach the live API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
CSC Validation ReportPR Format
Labels applied:
|
There was a problem hiding this comment.
Pull request overview
Fixes export workflow push failures caused by newly large postcodes.* export artifacts exceeding GitHub’s 100MB limit by aligning postcodes handling with the existing “compress + don’t commit large exports” approach.
Changes:
- Add gzip compression for additional
postcodes.*export formats in the export workflow. - Update
.gitignoreto ignore uncompressedpostcodes.*export artifacts across export directories. - Add a new Luxembourg (
LU) postcode importer script that generatescontributions/postcodes/LU.jsonfrom the CACLR XLSX dataset.
Reviewed changes
Copilot reviewed 1 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
bin/scripts/sync/import_luxembourg_postcodes.py |
New LU postcode importer (fetch + parse XLSX, map cantons → state FK, write LU.json). |
.gitignore |
Ignores raw uncompressed postcodes.* exports to avoid committing oversized artifacts. |
.github/workflows/export.yml |
Compresses additional postcode export files into .gz for Release uploads. |
dr5hn
added a commit
that referenced
this pull request
Apr 27, 2026
…g, postcodes (#1491) Adds: - v3.2 release section summarising today's work (#1349, #1352, #1039, #1481-#1490). - Notable callout for FR mainland city region->department remap (mirrors the existing IT one), explicitly calling out the behaviour change for consumers querying by region state_code. - Chronological entries for each of the 19 PRs that landed today (changelog automation only runs weekly). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dr5hn
added a commit
that referenced
this pull request
Apr 28, 2026
These were committed at tiny placeholder sizes during #1039's initial exports wiring (#1403), but the export.yml workflow regenerates them at full size every run — 239 MB json/postcodes.json, 263 MB xml/postcodes.xml, 171 MB yml/postcodes.yml, 123 MB psql/postcodes.sql, 105 MB sqlite/postcodes.sqlite3, 90 MB sql/postcodes.sql, 73 MB sqlserver/postcodes.sql — all over GitHub's 100 MB hard limit. #1490 added .gitignore entries for the same paths but gitignore is inert against tracked files, so the export PR's git push still failed. Untrack here so the gitignore actually applies; large compressed .gz versions continue to ship via GitHub Releases. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The 15:45 UTC export run (#25004960042) failed with
pre-receive hook declinedbecause raw postcode export files exceed GitHub's 100MB hard limit:The
cities.*files have always been gzipped + gitignored for the same reason. This PR mirrors the same pattern for thepostcodes.*files (added by #1039 / #1403).Changes
export.yml: addgzip -9 -k -fforjson/postcodes.json,xml/postcodes.xml,yml/postcodes.yml,csv/postcodes.csv,sqlite/postcodes.sqlite3,sqlserver/postcodes.sql(the existinggzipforsql/postcodes.sqlandpsql/postcodes.sqlis unchanged)..gitignore: add raw uncompressedpostcodes.*for all 7 directories so the export commit only carries small files; large files only land in the GitHub Release.Why this matters
Blocks today's data fixes from reaching the live API. Once this merges and the export workflow is re-triggered, the API ingest picks up:
Test plan
.gzpostcode files appear on the Release.Refs: #1039, #1349, #1352