Sitemap improvements: lastmod, image entries, noindex on excluded pages#1427
Merged
andygrunwald merged 4 commits intomainfrom May 5, 2026
Merged
Sitemap improvements: lastmod, image entries, noindex on excluded pages#1427andygrunwald merged 4 commits intomainfrom
andygrunwald merged 4 commits intomainfrom
Conversation
Google uses sitemap lastmod as a freshness signal and ignores
changefreq/priority. With ~200 podcast episodes, a daily-syncing tech
podcast index, and an actively curated movie list, every URL today is
just a bare <loc> — search engines have no signal for what changed.
Add a small build-time helper (scripts/sitemap-lastmod.mjs) that walks
the relevant content directories synchronously, parses pubDate / date /
publishedAt / latestEpisodePublished out of frontmatter or JSON, and
exposes a Map<urlPath, ISO> for the sitemap serialize hook to look up.
The map is built once at config load — the per-URL serialize callback
only does Map.get() so the build cost stays negligible.
Coverage:
- Podcast episodes: pubDate
- Blog posts: pubDate
- Meetup events: date
- /podcast/, /blog/, /meetup/<flavor>/, /deutsche-tech-podcasts/,
/filme-fuer-softwareentwickler/, /spiele-fuer-softwareentwickler/,
/: max of the underlying entries
Out of scope (v1, defaults to build time): tag pages, per-genre game
pages, per-category and per-type movie pages. They refresh on every
build anyway, so a missing lastmod just costs a freshness signal not
correctness.
277 of 464 URLs now carry <lastmod>. Verified spot checks:
/podcast/ 2026-05-05 (newest episode)
/podcast/episode/00-... 2022-02-08 (entry pubDate)
/deutsche-tech-podcasts/ 2026-05-04 (newest tech-podcast episode)
/ 2026-05-05 (homepage tracks newest episode)
/impressum/ <no lastmod, correct>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The exclude array carried no rationale, so a future maintainer adding a similar page would not know why /linktree/ and meetup .../promote/ are kept out of the sitemap. Add a comment block explaining each entry plus the substring-match semantics, and rename `exludeFromSitemap` to `excludeFromSitemap` while we are touching the file. No behaviour change — sitemap still emits 464 URLs and none of the intentionally excluded pages leak in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sitemap exclude list keeps /linktree/ and meetup */promote/* out of
sitemap-0.xml, but Google can still find them through inbound links —
the QR codes we hand out at meetups are exactly that kind of link. The
robots <meta> in MainHead actively asserted "follow, index" on every
page, including these, which contradicted the intent of the exclude
list.
Add a `noindex` boolean prop to MainHead. When true, the component
emits <meta name="robots" content="noindex, nofollow"> instead of the
default index-promoting tag. Pass `noindex` from:
- src/pages/linktree.astro
- src/components/meetup/PromoteSocialImage.astro
- src/components/meetup/PromoteAnnounceNewsletter.astro
- src/components/meetup/PromoteNewsletter.astro (drive-by: this
component imported MainHead but never used it; its template had no
<head> at all, so the page was emitting raw HTML with no robots
signal. Add a small <head> with the noindex meta directly and drop
the dead import.)
Verified all 7 excluded URLs now ship a noindex meta; all other pages
keep the original index-promoting tag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add an `extractOgImage` helper that reads the rendered HTML for a URL and returns its og:image meta as an absolute URL. The sitemap serialize hook now attaches one image entry per URL via the `image:` namespace, giving Google Image search a representative thumbnail per page without bloating the sitemap with decorative assets like favicons, brand SVGs, or background patterns. The pick is good per page type: - episode pages episode cover (Astro-processed _astro/<slug>.<hash>.jpg) - homepage the hosts photo - directory pages the German Tech Podcasts / games / movies header 458 of 464 URLs now carry an <image:image><image:loc>...</image:loc> entry. The 6 without are pages that don't pass `image=` to MainHead; those just omit the image entry and stay intact otherwise. The helper runs at sitemap-serialise time, after Astro has written every page to dist/, so it can read the actual rendered meta. Tiny regex parser is used because we only need one specific meta tag — not worth a full DOM parse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for nifty-bardeen-5c7e53 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four isolated commits on
andygrunwald/sitemap-improvements, addressing the sitemap audit:3bfe4136— Emit<lastmod>for content-driven pages. New build-time helperscripts/sitemap-lastmod.mjswalks the relevant content directories synchronously, parsespubDate/date/publishedAt/latestEpisodePublishedout of frontmatter or JSON, and exposes aMap<urlPath, ISO>for the sitemapserializehook. Coverage: podcast episodes, blog posts, meetup events, and the matching index pages (/podcast/,/blog/,/meetup/<flavor>/,/deutsche-tech-podcasts/,/filme-fuer-softwareentwickler/,/spiele-fuer-softwareentwickler/,/). Tag pages and per-genre/category routes fall back to build time. 277 of 464 URLs now carry<lastmod>.ecf00e96— Document the exclude list and fix the typo. RenamedexludeFromSitemap→excludeFromSitemapand added a comment block explaining each entry (meetup/<flavor>/promote/for short-lived QR-code campaign pages,linktree/for a redirect-only landing). No behaviour change.800158d5— Mark sitemap-excluded pages asnoindex,nofollow. Defence-in-depth: the exclude list keeps these URLs out ofsitemap-0.xml, but Google can still discover them via inbound QR-code links. Added anoindexboolean prop toMainHead.astro; passed it from the four affected templates (linktree,PromoteSocialImage,PromoteAnnounceNewsletter,PromoteNewsletter). Drive-by:PromoteNewsletterhad a deadMainHeadimport and no<head>at all — added a small inline<head>with the noindex meta and dropped the unused import.2274501c— Emit image sitemap entries from each page's og:image. NewextractOgImagehelper reads the rendered HTML for each URL and attaches one image entry per URL via theimage:namespace. 458 of 464 URLs now carry an<image:loc>. Per-page selections look right: episode covers for/podcast/episode/..., hosts photo for/, dedicated headers for the directory pages.Test plan
make buildgreen at 472 pages./linktree/andmeetup/*/promote/out of the sitemap (0 leaks).<meta name="robots" content="noindex, nofollow">; all other pages keep the original index-promoting tag./podcast/= newest episode pubDate,/blog/post/<slug>/= entry pubDate,/impressum/correctly lacks a lastmod./_astro/<slug>.<hash>.jpg; homepage uses the hosts photo.🤖 Generated with Claude Code