Sitemap improvements: lastmod, image entries, noindex on excluded pages by andygrunwald · Pull Request #1427 · EngineeringKiosk/webpage

andygrunwald · 2026-05-05T11:06:25Z

Summary

Four isolated commits on andygrunwald/sitemap-improvements, addressing the sitemap audit:

3bfe4136 — Emit <lastmod> for content-driven pages. New build-time helper scripts/sitemap-lastmod.mjs walks the relevant content directories synchronously, parses pubDate / date / publishedAt / latestEpisodePublished out of frontmatter or JSON, and exposes a Map<urlPath, ISO> for the sitemap serialize hook. Coverage: podcast episodes, blog posts, meetup events, and the matching index pages (/podcast/, /blog/, /meetup/<flavor>/, /deutsche-tech-podcasts/, /filme-fuer-softwareentwickler/, /spiele-fuer-softwareentwickler/, /). Tag pages and per-genre/category routes fall back to build time. 277 of 464 URLs now carry <lastmod>.
ecf00e96 — Document the exclude list and fix the typo. Renamed exludeFromSitemap → excludeFromSitemap and added a comment block explaining each entry (meetup/<flavor>/promote/ for short-lived QR-code campaign pages, linktree/ for a redirect-only landing). No behaviour change.
800158d5 — Mark sitemap-excluded pages as noindex,nofollow. Defence-in-depth: the exclude list keeps these URLs out of sitemap-0.xml, but Google can still discover them via inbound QR-code links. Added a noindex boolean prop to MainHead.astro; passed it from the four affected templates (linktree, PromoteSocialImage, PromoteAnnounceNewsletter, PromoteNewsletter). Drive-by: PromoteNewsletter had a dead MainHead import and no <head> at all — added a small inline <head> with the noindex meta and dropped the unused import.
2274501c — Emit image sitemap entries from each page's og:image. New extractOgImage helper reads the rendered HTML for each URL and attaches one image entry per URL via the image: namespace. 458 of 464 URLs now carry an <image:loc>. Per-page selections look right: episode covers for /podcast/episode/..., hosts photo for /, dedicated headers for the directory pages.

Test plan

make build green at 472 pages.
Sitemap exclude list still keeps /linktree/ and meetup/*/promote/ out of the sitemap (0 leaks).
All 7 excluded URLs now ship <meta name="robots" content="noindex, nofollow">; all other pages keep the original index-promoting tag.
Spot-checks on lastmod: /podcast/ = newest episode pubDate, /blog/post/<slug>/ = entry pubDate, /impressum/ correctly lacks a lastmod.
Spot-checks on og:image: episode covers come through as /_astro/<slug>.<hash>.jpg; homepage uses the hosts photo.
Validate emitted XML against the schemas at https://www.sitemaps.org/protocol.html and https://developers.google.com/search/docs/crawling-indexing/sitemaps/image-sitemaps once deployed.
Re-submit the sitemap in Google Search Console after merge.

🤖 Generated with Claude Code

Google uses sitemap lastmod as a freshness signal and ignores changefreq/priority. With ~200 podcast episodes, a daily-syncing tech podcast index, and an actively curated movie list, every URL today is just a bare <loc> — search engines have no signal for what changed. Add a small build-time helper (scripts/sitemap-lastmod.mjs) that walks the relevant content directories synchronously, parses pubDate / date / publishedAt / latestEpisodePublished out of frontmatter or JSON, and exposes a Map<urlPath, ISO> for the sitemap serialize hook to look up. The map is built once at config load — the per-URL serialize callback only does Map.get() so the build cost stays negligible. Coverage: - Podcast episodes: pubDate - Blog posts: pubDate - Meetup events: date - /podcast/, /blog/, /meetup/<flavor>/, /deutsche-tech-podcasts/, /filme-fuer-softwareentwickler/, /spiele-fuer-softwareentwickler/, /: max of the underlying entries Out of scope (v1, defaults to build time): tag pages, per-genre game pages, per-category and per-type movie pages. They refresh on every build anyway, so a missing lastmod just costs a freshness signal not correctness. 277 of 464 URLs now carry <lastmod>. Verified spot checks: /podcast/ 2026-05-05 (newest episode) /podcast/episode/00-... 2022-02-08 (entry pubDate) /deutsche-tech-podcasts/ 2026-05-04 (newest tech-podcast episode) / 2026-05-05 (homepage tracks newest episode) /impressum/ <no lastmod, correct> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The exclude array carried no rationale, so a future maintainer adding a similar page would not know why /linktree/ and meetup .../promote/ are kept out of the sitemap. Add a comment block explaining each entry plus the substring-match semantics, and rename `exludeFromSitemap` to `excludeFromSitemap` while we are touching the file. No behaviour change — sitemap still emits 464 URLs and none of the intentionally excluded pages leak in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The sitemap exclude list keeps /linktree/ and meetup */promote/* out of sitemap-0.xml, but Google can still find them through inbound links — the QR codes we hand out at meetups are exactly that kind of link. The robots <meta> in MainHead actively asserted "follow, index" on every page, including these, which contradicted the intent of the exclude list. Add a `noindex` boolean prop to MainHead. When true, the component emits <meta name="robots" content="noindex, nofollow"> instead of the default index-promoting tag. Pass `noindex` from: - src/pages/linktree.astro - src/components/meetup/PromoteSocialImage.astro - src/components/meetup/PromoteAnnounceNewsletter.astro - src/components/meetup/PromoteNewsletter.astro (drive-by: this component imported MainHead but never used it; its template had no <head> at all, so the page was emitting raw HTML with no robots signal. Add a small <head> with the noindex meta directly and drop the dead import.) Verified all 7 excluded URLs now ship a noindex meta; all other pages keep the original index-promoting tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add an `extractOgImage` helper that reads the rendered HTML for a URL and returns its og:image meta as an absolute URL. The sitemap serialize hook now attaches one image entry per URL via the `image:` namespace, giving Google Image search a representative thumbnail per page without bloating the sitemap with decorative assets like favicons, brand SVGs, or background patterns. The pick is good per page type: - episode pages episode cover (Astro-processed _astro/<slug>.<hash>.jpg) - homepage the hosts photo - directory pages the German Tech Podcasts / games / movies header 458 of 464 URLs now carry an <image:image><image:loc>...</image:loc> entry. The 6 without are pages that don't pass `image=` to MainHead; those just omit the image entry and stay intact otherwise. The helper runs at sitemap-serialise time, after Astro has written every page to dist/, so it can read the actual rendered meta. Tiny regex parser is used because we only need one specific meta tag — not worth a full DOM parse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

netlify · 2026-05-05T11:06:32Z

✅ Deploy Preview for nifty-bardeen-5c7e53 ready!

Name	Link
🔨 Latest commit	`2274501`
🔍 Latest deploy log	https://app.netlify.com/projects/nifty-bardeen-5c7e53/deploys/69f9cf34e474b10008d93e44
😎 Deploy Preview	https://deploy-preview-1427--nifty-bardeen-5c7e53.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

andygrunwald and others added 4 commits May 5, 2026 13:01

andygrunwald merged commit eb91bea into main May 5, 2026
6 checks passed

andygrunwald deleted the andygrunwald/sitemap-improvements branch May 5, 2026 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sitemap improvements: lastmod, image entries, noindex on excluded pages#1427

Sitemap improvements: lastmod, image entries, noindex on excluded pages#1427
andygrunwald merged 4 commits intomainfrom
andygrunwald/sitemap-improvements

andygrunwald commented May 5, 2026

Uh oh!

netlify Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrunwald commented May 5, 2026

Summary

Test plan

Uh oh!

netlify Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nifty-bardeen-5c7e53 ready!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify Bot commented May 5, 2026 •

edited

Loading