Skip to content

perf: reduce query path overhead by ~12% (vtgate) and ~3% (gRPC)#20148

Draft
schlubbi wants to merge 3 commits into
vitessio:mainfrom
schlubbi:perf/query-path-overhead
Draft

perf: reduce query path overhead by ~12% (vtgate) and ~3% (gRPC)#20148
schlubbi wants to merge 3 commits into
vitessio:mainfrom
schlubbi:perf/query-path-overhead

Conversation

@schlubbi
Copy link
Copy Markdown

@schlubbi schlubbi commented May 20, 2026

This is the result of a pi-autoresearch session that was tasked to improve performance. All changes are AI generated and haven't been verified yet

Generated Content :copilot:

Reduce per-query CPU overhead across the vtgate execution hot path and gRPC serialization layer through allocation reduction, lazy initialization, and fast-path optimizations.

vtgate Execute path (hot-cache, point select):

  • Before: 11,404 ns/op, 201 allocs/op, 11.4 KB/op
  • After: ~10,000 ns/op, 164 allocs/op, ~9.6 KB/op
  • Improvement: -12% latency, -18% allocations

gRPC Execute round-trip:

  • Before: 91,991 ns/op, 193 allocs/op
  • After: 89,585 ns/op, 185 allocs/op
  • Improvement: -2.6% latency, -4% allocations (range_select: -17% allocs)

Key optimizations:

  • Pool LogStats via sync.Pool, release after query logging
  • Skip trace span annotations when noop tracer is active
  • Skip noteAliasedExprName for ColName (never rewritten)
  • Pool TrackedBuffer in sqlparser.String()
  • Fixed [4]string array in stats key construction
  • Skip CopyBindVariables when no log subscribers
  • Lazy time.Now() and callerID in ExpressionEnv
  • Inline single-shard path in multiGoTransaction
  • Embed BindVarNeeds value in normalizer struct
  • Pool ExecuteRequest in gRPC client
  • Batch-allocate Row/Value arrays in proto3 conversion
  • Combined callInfoContext (embed CallInfo in context type)
  • ResolveDestinations single-destination fast path
  • Single-shard Result fast path (skip AppendResult)
  • Cache RoutingIndexes on Plan at build time
  • Pool AllErrorRecorder in ExecuteMultiShard
  • DestinationKeyspaceID type-assert bypass
  • Stack-allocate PlanKey hash Digest

Description

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

AI Disclosure

Reduce per-query CPU overhead across the vtgate execution hot path
and gRPC serialization layer through allocation reduction, lazy
initialization, and fast-path optimizations.

vtgate Execute path (hot-cache, point select):
- Before: 11,404 ns/op, 201 allocs/op, 11.4 KB/op
- After:  ~10,000 ns/op, 164 allocs/op, ~9.6 KB/op
- Improvement: -12% latency, -18% allocations

gRPC Execute round-trip:
- Before: 91,991 ns/op, 193 allocs/op
- After:  89,585 ns/op, 185 allocs/op
- Improvement: -2.6% latency, -4% allocations (range_select: -17% allocs)

Key optimizations:
- Pool LogStats via sync.Pool, release after query logging
- Skip trace span annotations when noop tracer is active
- Skip noteAliasedExprName for ColName (never rewritten)
- Pool TrackedBuffer in sqlparser.String()
- Fixed [4]string array in stats key construction
- Skip CopyBindVariables when no log subscribers
- Lazy time.Now() and callerID in ExpressionEnv
- Inline single-shard path in multiGoTransaction
- Embed BindVarNeeds value in normalizer struct
- Pool ExecuteRequest in gRPC client
- Batch-allocate Row/Value arrays in proto3 conversion
- Combined callInfoContext (embed CallInfo in context type)
- ResolveDestinations single-destination fast path
- Single-shard Result fast path (skip AppendResult)
- Cache RoutingIndexes on Plan at build time
- Pool AllErrorRecorder in ExecuteMultiShard
- DestinationKeyspaceID type-assert bypass
- Stack-allocate PlanKey hash Digest

Signed-off-by: Stefan Jöst <stefan@joest.dev>
Signed-off-by: Stefan Jöst <schlubbi@github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 10:20
@github-actions github-actions Bot added this to the v25.0.0 milestone May 20, 2026
@vitess-bot vitess-bot Bot added NeedsWebsiteDocsUpdate What it says NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels May 20, 2026
@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot Bot commented May 20, 2026

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot Bot commented May 20, 2026

Hello! 👋

This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked.

You can find the performance comparison on the arewefastyet website.

…Key hash

The previous commit added Init256() usage in engine/plan.go but
the vthash/hash.go and highway/highwayhash.go changes were not
included in the patch.

Signed-off-by: Stefan Jöst <stefan@joest.dev>
Signed-off-by: Stefan Jöst <schlubbi@github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets per-query CPU/allocation reductions in vtgate’s execution hot path and the gRPC serialization/client layers, primarily via pooling, lazy initialization, and single-shard fast paths.

Changes:

  • Introduces pooling for frequently-allocated request/logging/formatting objects (e.g. gRPC ExecuteRequest, vtgate LogStats, sqlparser TrackedBuffer) and adds selective/lazy computation for hot-path fields (e.g. trace annotations, evalengine time/user).
  • Adds vtgate hot-path fast paths (single-shard result handling, stats-key construction, routing resolution) and caches some plan-derived metadata.
  • Optimizes proto3 row/value conversions via batch allocations.
Show a summary per file
File Description
go/vt/vttablet/grpctabletconn/conn.go Pools ExecuteRequest objects for gRPC Execute to reduce allocations.
go/vt/vtgate/scatter_conn.go Allocation reductions and fast paths in scatter execution; pools AllErrorRecorder on success path.
go/vt/vtgate/plan_execute.go Uses cached routing-index info when populating LogStats.
go/vt/vtgate/logstats/logstats.go Pools LogStats objects; adds Release() for reuse.
go/vt/vtgate/executor.go Skips trace annotations for noop tracer; conditional bindvar copying; releases pooled LogStats when not delivered.
go/vt/vtgate/evalengine/expr_env.go Lazily initializes NOW()/current user information in ExpressionEnv.
go/vt/vtgate/engine/routing.go Adds a bindvar numeric fast path to avoid eval machinery for simple cases.
go/vt/vtgate/engine/route.go Adds a single-bindvar fast path in query construction.
go/vt/vtgate/engine/plan.go Caches routing-index metadata on Plan; stack-allocates PlanKey hash hasher.
go/vt/sqlparser/tracked_buffer.go Pools TrackedBuffer usage in sqlparser.String().
go/vt/sqlparser/normalizer.go Reduces normalizer allocations via lazy map init and embedded BindVarNeeds.
go/vt/callinfo/plugin_grpc.go Uses a combined context+callinfo wrapper to reduce allocations in gRPC contexts.
go/vt/callerid/callerid.go Avoids context wrapping when caller IDs are nil.
go/trace/trace.go Adds trace.IsNoop() to allow callers to skip work when tracing is disabled.
go/streamlog/streamlog.go Adds HasSubscribers(); changes Send to return delivery count.
go/sqltypes/proto3.go Batch allocation optimizations for proto3 row/value conversions.

Copilot's findings

  • Files reviewed: 18/18 changed files
  • Comments generated: 4

Comment thread go/sqltypes/proto3.go
Comment on lines +74 to 110
// Batch-allocate all Row structs in a single backing array to reduce
// per-row heap allocations from N to 1.
backing := make([]querypb.Row, len(rows))
result := make([]*querypb.Row, len(rows))

// Pre-allocate a single Lengths backing array for all rows.
nCols := len(rows[0])
allLengths := make([]int64, 0, nCols*len(rows))

// First pass: compute lengths and accumulate total value size.
totalValueBytes := 0
for i, r := range rows {
result[i] = RowToProto3(r)
result[i] = &backing[i]
start := len(allLengths)
for _, c := range r {
if c.IsNull() {
allLengths = append(allLengths, -1)
} else {
l := c.Len()
allLengths = append(allLengths, int64(l))
totalValueBytes += l
}
}
backing[i].Lengths = allLengths[start:]
}

// Second pass: batch-allocate all Values into a single buffer.
allValues := make([]byte, 0, totalValueBytes)
for i, r := range rows {
start := len(allValues)
for _, c := range r {
if !c.IsNull() {
allValues = append(allValues, c.Raw()...)
}
}
backing[i].Values = allValues[start:]
}
Comment on lines +283 to +289
Type: getPlanType(primitive),
QueryType: sqlparser.ASTToStatementType(stmt),
Original: query,
Instructions: primitive,
BindVarNeeds: bindVarNeeds,
TablesUsed: tablesUsed,
RoutingIndexes: GetRoutingIndexes(primitive),
Comment on lines 499 to 504
logStats.RowsAffected = qr.RowsAffected
logStats.RowsReturned = uint64(len(qr.Rows))
// log the tables and routing indexes used in the plan for successful query execution.
logStats.TablesUsed = plan.TablesUsed
executedRoot := vcursor.ExecutedPrimitive()
if executedRoot == nil {
executedRoot = plan.Instructions
}
logStats.RoutingIndexesUsed = engine.GetRoutingIndexes(executedRoot)
logStats.RoutingIndexesUsed = plan.RoutingIndexes
}
Comment thread go/vt/vtgate/executor.go
Comment on lines +1196 to +1198
if e.queryLogger.HasSubscribers() {
logStats.BindVariables = sqltypes.CopyBindVariables(bindVars)
}
…ched_size

- Revert lazy time.Now() and callerID in ExpressionEnv — caused nil
  pointer panic in TestCompilerReference/FnUnixTimestamp because
  compiled expressions evaluate time functions without the eager init
  the test infrastructure expects.
- Update callinfo/plugin_grpc_test.go for renamed callInfoContext type.
- Regenerate engine/cached_size.go for new RoutingIndexes field.

Signed-off-by: Stefan Jöst <stefan@joest.dev>
Signed-off-by: Stefan Jöst <schlubbi@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Benchmark me Add label to PR to run benchmarks NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants