Skip to content

Add render cache for SVG icons#36863

Merged
silverwind merged 15 commits intogo-gitea:mainfrom
silverwind:svgcache
Mar 10, 2026
Merged

Add render cache for SVG icons#36863
silverwind merged 15 commits intogo-gitea:mainfrom
silverwind:svgcache

Conversation

@silverwind
Copy link
Copy Markdown
Member

@silverwind silverwind commented Mar 8, 2026

Cache the final rendered template.HTML output for SVG icons that use non-default size or class parameters using sync.Map. Icons rendered with default parameters bypass the cache and use a direct read-only map lookup.

Benchmark results for rendering 1000 varied SVG icons under high concurrency (16 goroutines):

Per page (1000 SVGs) Allocs Memory
Uncached 0.356ms 8,014 1.17MB
Cached 0.015ms 1,000 16KB

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 8, 2026
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Mar 8, 2026
@silverwind silverwind requested a review from Copilot March 8, 2026 11:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory render cache for SVG icon HTML in modules/svg, targeting reduced allocations/latency when the same icon is rendered repeatedly with non-default size/class under concurrency.

Changes:

  • Introduces a global sync.Map cache for rendered template.HTML for (icon,size,class) combinations.
  • Bypasses the cache for default parameters to keep the fast path as a direct map lookup.
  • Clears the render cache during svg.Init() to avoid stale entries across re-initialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@github-actions github-actions bot added modifies/api This PR adds API routes or modifies them and removed modifies/api This PR adds API routes or modifies them labels Mar 8, 2026
@silverwind silverwind force-pushed the svgcache branch 2 times, most recently from 84cd728 to 7ca540a Compare March 8, 2026 11:20
@silverwind silverwind requested a review from Copilot March 8, 2026 11:20
Cache the final rendered template.HTML output for SVG icons that use
non-default size or class parameters using sync.Map. Icons rendered with
default parameters bypass the cache and use a direct read-only map lookup.

Co-Authored-By: Claude (Opus 4.6) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

silverwind and others added 2 commits March 8, 2026 12:25
Use the helper function in MockIcon restore and test setup instead of
manually clearing the map, ensuring cache size counter is also reset.

Co-Authored-By: Claude (Opus 4.6) <noreply@anthropic.com>
Remove redundant output content assertions, focus on cache behavior.

Co-Authored-By: Claude (Opus 4.6) <noreply@anthropic.com>
Signed-off-by: silverwind <me@silverwind.io>
@silverwind silverwind added the performance/speed performance issues with slow downs label Mar 8, 2026
Signed-off-by: silverwind <me@silverwind.io>
@wxiaoguang wxiaoguang marked this pull request as draft March 8, 2026 13:01
The cache key space is naturally bounded by the finite set of
hardcoded template calls. No user input can reach the size or class
parameters, so unbounded growth is impossible in practice.

Co-Authored-By: Claude (Opus 4.6) <noreply@anthropic.com>
@silverwind silverwind marked this pull request as ready for review March 8, 2026 13:30
silverwind and others added 2 commits March 8, 2026 14:42
Init() only runs once per process and MockIcon() replaces the
svgIcons map entry which produces different cache keys, so clearing
is unnecessary.

Co-Authored-By: Claude (Opus 4.6) <noreply@anthropic.com>
Co-Authored-By: Claude (Opus 4.6) <noreply@anthropic.com>
@silverwind
Copy link
Copy Markdown
Member Author

Removed all cache clears. The code paths install/non-install page never happen in the same run, so we never need to clear this cache.

@wxiaoguang
Copy link
Copy Markdown
Contributor

Mutex will make it slower, right? Is it really justified?

Have you benchmarked?

Yes, IIRC, with mutex it was 10 times slower using concurrency of 16. But that was on a page with 1000 varied SVGs, so not really representative.

What kind of Mutex did you use? I don't think RWMutex can be the problem.

@wxiaoguang
Copy link
Copy Markdown
Contributor

Actually I guess no one would feel the "cache".

I think big diff pages ought to have that many SVGs. It's at minimum 5 SVGs per file, so a 200 file diff would have 1000+ SVGs. 300ms is probably still miniscule compared to full render time but not completely zero.

Why 300ms? If I understand correctly, your tests mean that "1000 SVGs take 0.356ms", right?

@silverwind
Copy link
Copy Markdown
Member Author

silverwind commented Mar 8, 2026

Right, microseconds, not milliseconds :)

@wxiaoguang
Copy link
Copy Markdown
Contributor

I beg you to understand the problem and use facts to discuss.

@silverwind
Copy link
Copy Markdown
Member Author

I will benchmark mutex vs. no-mutex in a bit. Scenario will be concurrency 16, 200 SVGs with 50 variations. That should give representative results.

@silverwind
Copy link
Copy Markdown
Member Author

silverwind commented Mar 8, 2026

Written by Claude.

Benchmark results (Apple M3 Max, go test -bench=. -benchmem -count=5, 200 icons, 50 variations, GOMAXPROCS=16, b.SetParallelism(16)):

BenchmarkRenderHTML_CachedNoMutex-16       435403636           2.912 ns/op          0 B/op          0 allocs/op
BenchmarkRenderHTML_Cached-16                8632720         130.5   ns/op          0 B/op          0 allocs/op
BenchmarkRenderHTML_Uncached-16              5451798         226.2   ns/op          5 B/op          0 allocs/op
BenchmarkRenderHTML_DefaultSize-16        1000000000           0.831 ns/op          0 B/op          0 allocs/op
  • CachedNoMutex: same code path as Cached but without RWMutex (pre-warmed read-only cache)
  • Cached: full renderHTML path with RWMutex cache hit
  • Uncached: cache miss every time (unique keys), string manipulation + mutex write
  • DefaultSize: fast path (default size, no extra class, no cache/mutex involved)

RWMutex + function call overhead adds ~128ns at 16 goroutines. The cache is still a net win (~131 vs ~225 ns), but the mutex is clearly the bottleneck in the cached path.

Benchmark code (svg_test.go)
package svg

import (
	"fmt"
	"html/template"
	"testing"

	gitea_html "code.gitea.io/gitea/modules/htmlutil"
)

type benchmarkCall struct {
	icon  string
	size  int
	class string
}

func setupBenchmark(numIcons, numVariations int) []benchmarkCall {
	svgIcons = make(map[string]svgIconItem, numIcons)
	for i := range numIcons {
		name := fmt.Sprintf("icon-%d", i)
		svgIcons[name] = svgIconItem{
			html: fmt.Sprintf(`<svg class="svg %s" width="16" height="16"><path d="M0 0h16v16H0z"/></svg>`, name),
		}
	}
	svgRenderedCache = make(map[svgCacheKey]template.HTML)
	calls := make([]benchmarkCall, numIcons*numVariations)
	for i := range numIcons {
		for v := range numVariations {
			calls[i*numVariations+v] = benchmarkCall{
				icon:  fmt.Sprintf("icon-%d", i),
				size:  16 + v,
				class: "extra-class",
			}
		}
	}
	return calls
}

// BenchmarkRenderHTML_CachedNoMutex benchmarks the same path as Cached but without RWMutex.
func BenchmarkRenderHTML_CachedNoMutex(b *testing.B) {
	calls := setupBenchmark(200, 50)
	for _, c := range calls {
		RenderHTML(c.icon, c.size, c.class)
	}
	numCalls := len(calls)
	b.SetParallelism(16)
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			c := &calls[i%numCalls]
			size, class := gitea_html.ParseSizeAndClass(defaultSize, "", c.size, c.class)
			if svgItem, ok := svgIcons[c.icon]; ok {
				if size != defaultSize || class != "" {
					cacheKey := svgCacheKey{c.icon, size, class}
					if cachedHTML, cached := svgRenderedCache[cacheKey]; cached && !svgItem.mocking {
						_ = cachedHTML
					}
				}
			}
			i++
		}
	})
}

// BenchmarkRenderHTML_Cached benchmarks the full renderHTML with cache hits.
func BenchmarkRenderHTML_Cached(b *testing.B) {
	calls := setupBenchmark(200, 50)
	for _, c := range calls {
		RenderHTML(c.icon, c.size, c.class)
	}
	numCalls := len(calls)
	b.SetParallelism(16)
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			c := &calls[i%numCalls]
			RenderHTML(c.icon, c.size, c.class)
			i++
		}
	})
}

// BenchmarkRenderHTML_Uncached benchmarks renderHTML without cache hits (unique keys every time).
func BenchmarkRenderHTML_Uncached(b *testing.B) {
	calls := setupBenchmark(200, 50)
	numCalls := len(calls)
	b.SetParallelism(16)
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			c := &calls[i%numCalls]
			RenderHTML(c.icon, 1000+i, c.class)
			i++
		}
	})
}

// BenchmarkRenderHTML_DefaultSize benchmarks the fast path (default size, no extra class, no cache).
func BenchmarkRenderHTML_DefaultSize(b *testing.B) {
	calls := setupBenchmark(200, 50)
	numCalls := len(calls)
	b.SetParallelism(16)
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			RenderHTML(calls[i%numCalls].icon)
			i++
		}
	})
}

@silverwind
Copy link
Copy Markdown
Member Author

silverwind commented Mar 8, 2026

Mutex overhead seems a bit much, not sure though.

@wxiaoguang
Copy link
Copy Markdown
Contributor

wxiaoguang commented Mar 8, 2026

RWMutex.RLock atomic.Add invalidates CPU cache for all cores.

If you need the "limit", the RWMutex is still needed. Otherwise, sync.Map can't count the items.

If you need to make sync.Map have item count, then it needs another "atomic", then the performance degrades to the same level as RWMutex.RLock


OK, maybe can use count with a write-only mutex.

@wxiaoguang
Copy link
Copy Markdown
Contributor

c6884dd

BenchmarkRenderHTML_CachedNoMutex
BenchmarkRenderHTML_CachedNoMutex-12    	166848481	         6.809 ns/op
BenchmarkRenderHTML_Cached
BenchmarkRenderHTML_Cached-12           	168768458	         6.907 ns/op
BenchmarkRenderHTML_Uncached
BenchmarkRenderHTML_Uncached-12         	109087372	        15.36 ns/op
BenchmarkRenderHTML_DefaultSize
BenchmarkRenderHTML_DefaultSize-12      	996117556	         1.219 ns/op

@silverwind
Copy link
Copy Markdown
Member Author

lgtm

@silverwind
Copy link
Copy Markdown
Member Author

Written by Claude.

The ~20x speedup (130 ns → 6.9 ns) comes from eliminating CPU cache line invalidation on the read path:

  • RWMutex.RLock() does an atomic.Add (a write) to increment the reader count. This invalidates the cache line across all cores, causing contention when many goroutines read concurrently.
  • sync.Map.Load() uses only atomic loads (reads) on its internal read-only map — no writes to shared memory, so no cross-core cache line bouncing.

sync.Map is designed exactly for this pattern: frequent reads, rare writes. Its read map is effectively lock-free for lookups, making cached performance nearly identical to the no-mutex baseline.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 8, 2026
@wxiaoguang
Copy link
Copy Markdown
Contributor

wxiaoguang commented Mar 8, 2026

OK, finally improved the render time of a page with 1000 SVGs about 300us (0.3ms), for most normal pages, less than 0.1ms

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Mar 9, 2026
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Mar 10, 2026
@silverwind silverwind enabled auto-merge (squash) March 10, 2026 04:59
@silverwind silverwind merged commit 1dfb32a into go-gitea:main Mar 10, 2026
26 checks passed
@silverwind silverwind deleted the svgcache branch March 10, 2026 05:26
@GiteaBot GiteaBot added this to the 1.26.0 milestone Mar 10, 2026
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Mar 10, 2026
zjjhot added a commit to zjjhot/gitea that referenced this pull request Mar 10, 2026
* giteaofficial/main:
  Update minimum go version to 1.26.1, golangci-lint to 2.11.2, fix test style (go-gitea#36876)
  Add render cache for SVG icons (go-gitea#36863)
  Fix incorrect viewed files counter if reverted change was viewed (go-gitea#36819)
  [skip ci] Updated translations via Crowdin
  Clean up `refreshViewedFilesSummary` (go-gitea#36868)
  Remove `util.URLJoin` and replace all callers with direct path concatenation (go-gitea#36867)
  Optimize Docker build with dependency layer caching (go-gitea#36864)
  Fix URLJoin, markup render link reoslving, sign-in/up/linkaccount page common data (go-gitea#36861)
  Fix CodeQL code scanning alerts (go-gitea#36858)
  Refactor auth middleware (go-gitea#36848)
  Update Nix flake (go-gitea#36857)
  Update JS deps (go-gitea#36850)
  Load `mentionValues` asynchronously (go-gitea#36739)
  [skip ci] Updated translations via Crowdin
  Fix dbfs error handling (go-gitea#36844)
  Fix OAuth2 authorization code expiry and reuse handling (go-gitea#36797)
  Fix org permission API visibility checks for hidden members and private orgs (go-gitea#36798)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/go Pull requests that update Go code performance/speed performance issues with slow downs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants