Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
4693a17
Implement PoC of vreplication parallel applier
mattlord Feb 27, 2026
1092f25
Update vttablet --help output
mattlord Mar 2, 2026
5ab9591
Update workflows
mattlord Mar 2, 2026
5d3aa13
Remove logging changes
mattlord Mar 2, 2026
5a9e8b7
Undo errant removal
mattlord Mar 2, 2026
0b0c5ed
Add more unit tests
mattlord Mar 2, 2026
90a5739
Unit test fixes
mattlord Mar 2, 2026
c3ea987
Try to de-flake onlineddl_vrepl_stress test in CI
mattlord Mar 2, 2026
f0fa79b
Fix unit test
mattlord Mar 2, 2026
f92a472
More alignment with the serial applier
mattlord Mar 2, 2026
708eac1
Still trying to deflake the onlineddl_vrepl_stress test
mattlord Mar 2, 2026
89f18f7
Test changes
mattlord Mar 2, 2026
8d4c62b
Try to improve OnlineDDL with parallel applier
mattlord Mar 3, 2026
cf6d332
Performance improvements
mattlord Mar 4, 2026
0847da8
Mark the feature as experimental
mattlord Mar 4, 2026
4fbcb3a
Address potential resource leaks
mattlord Mar 4, 2026
75d3d9a
Comment struct members
mattlord Mar 4, 2026
e1ebe8d
vstreamer (producer) perf improvements and func comments
mattlord Mar 4, 2026
164f81c
Use the vterrors package
mattlord Mar 4, 2026
af0cc97
Switch to xxhash for writeset hashing
mattlord Mar 5, 2026
32da160
Use smaller channel buffer sizes
mattlord Mar 5, 2026
2670146
Minor improvements; remove debug logging code
mattlord Mar 5, 2026
debbb13
Restore most of the vstreamer changes
mattlord Mar 6, 2026
7c03539
Fix unit test framework
mattlord Mar 6, 2026
e22e167
Format file
mattlord Mar 6, 2026
42321fa
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Mar 6, 2026
73b2f17
Benchmark tools and improvements from iterating on the results
mattlord Mar 17, 2026
3723929
Benchmark tools and improvements from iterating on the results
mattlord Mar 17, 2026
f90aae8
Fix parallel applier FK check state after connection rotation
mattlord Mar 18, 2026
50181d5
Separate out benchmarks
mattlord Mar 26, 2026
8aea92e
Use structured logging for new log messages
mattlord Mar 26, 2026
d0ebf23
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Mar 27, 2026
a1aaecc
Don't log raw queries
mattlord Mar 28, 2026
5fe816d
Fix flaky onlineddl vrepl stress suite test
mattlord Mar 28, 2026
6e47b84
Address test flakes
mattlord Mar 30, 2026
13edc4b
Still trying to deflake the onlineddl vrepl stress suite in CI
mattlord Mar 31, 2026
2b8eb71
Address review comments
mattlord Mar 31, 2026
568bb2d
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Apr 1, 2026
028c7ab
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Apr 1, 2026
52311e6
Try to deflake the xtrabackup streaming tests in CI
mattlord Apr 1, 2026
b72055c
Address copilot review comments
mattlord Apr 2, 2026
87db7b4
Address some codex review comments
mattlord Apr 2, 2026
5fbcb51
Address copilot review comments
mattlord Apr 2, 2026
d9ad16d
Fix unit test
mattlord Apr 2, 2026
fa87d9f
Address review comment
mattlord Apr 2, 2026
22837e1
Address codex review comments
mattlord Apr 3, 2026
611f181
Address copilot review comment
mattlord Apr 3, 2026
8af0f2a
Address codex review comments
mattlord Apr 5, 2026
c90d5f6
Add comments about non-parallel applier edge case fixes
mattlord Apr 5, 2026
ea51b45
Address codex review comments
mattlord Apr 5, 2026
6fe9dcf
Address more codex review comments
mattlord Apr 6, 2026
3741cbd
Address more codex review comments
mattlord Apr 6, 2026
5febca7
You guessed it... more codex review fixes
mattlord Apr 6, 2026
fe76ed4
Yes...
mattlord Apr 6, 2026
cc18367
Moar
mattlord Apr 6, 2026
07a0fa1
More fixes
mattlord Apr 6, 2026
1818bcf
Another round
mattlord Apr 6, 2026
15d4664
Review+fix iterations with GPT-5.4
mattlord Apr 8, 2026
20c7094
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Apr 11, 2026
34d6825
Review+fix iterations with GPT-5.4
mattlord Apr 12, 2026
d2d56ff
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Apr 15, 2026
0c1dec5
Try to deflake FKExt test
mattlord Apr 16, 2026
fd5646d
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Apr 16, 2026
7231b4f
Try to address stalls and resulting memory usage
mattlord Apr 16, 2026
44fe8c1
Fixes from TestFKExt failures
mattlord Apr 21, 2026
63b2296
Address review feedback
mattlord Apr 22, 2026
3a833a8
Address review feedback
mattlord Apr 22, 2026
0a36a09
Address more review feedback
mattlord Apr 22, 2026
9bd44c3
Fixups
mattlord Apr 22, 2026
86fbfb4
Add more func comments
mattlord Apr 23, 2026
c8456e2
Code cleanup
mattlord Apr 23, 2026
f7b6ae9
Perf improvements
mattlord Apr 24, 2026
c16d0cf
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Apr 27, 2026
8243d1a
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord May 12, 2026
c874dcc
Remove watch-replication-stream flags that crept back in
mattlord May 12, 2026
2b33b94
Post merge fixes
mattlord May 12, 2026
2ecdfc9
Address review findings
mattlord May 13, 2026
bc69649
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord May 14, 2026
0c9a9e0
Rebuld vtadmin web protos after merging in main
mattlord May 14, 2026
e796a7f
Review cycle
mattlord May 15, 2026
a83f0fb
Fix linter errors
mattlord May 15, 2026
2e89d1e
Address review comments
mattlord Jun 5, 2026
613810f
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Jun 5, 2026
40a0d3b
Order fix
mattlord Jun 5, 2026
128fd22
Merge remote-tracking branch 'origin/main' into vstream_parallel_repl…
mattlord Jun 12, 2026
21f86d3
Address Fable review comments
mattlord Jun 12, 2026
4589d8d
Address review feedback
mattlord Jun 12, 2026
3712d5e
Fix unit test
mattlord Jun 12, 2026
0ee1f34
Address existing bug around copy phase and tablet restarts
mattlord Jun 12, 2026
9172243
Address Fable and Codex review comments
mattlord Jun 12, 2026
e791037
Address new gap
mattlord Jun 13, 2026
9204a53
More improvements from reviews
mattlord Jun 13, 2026
5e7e4b9
Address more review feedback
mattlord Jun 13, 2026
500d9d0
More review fixes
mattlord Jun 13, 2026
4fc1b73
Address golangci-lint error
mattlord Jun 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -236,3 +236,4 @@ formatters:
paths:
- examples$
- ^go/vt/proto/
- ^test/antithesis/
5 changes: 5 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,11 @@ return user.NeedsMigration() && migrate(user) || user
- **Copyright header** - New Go files must include the project copyright header with the current year
- **Always run `gofumpt -w`** on changed Go files before committing - this is mandatory
- **Always run `goimports -local "vitess.io/vitess" -w`** on changed Go files before committing
- **Always run `golangci-lint run --path-mode=abs --timeout 10m`** (from the `go/` directory, scoped to the changed package(s)) before reporting work complete. CI runs it and will surface modernize/style issues that `go vet`, `gofumpt`, and `goimports` do not — for example:
- `waitgroup`: prefer `WaitGroup.Go(func() { ... })` over `wg.Add(1); go func() { defer wg.Done(); ... }()`
- `rangeint`: prefer `for range N` over `for i := 0; i < N; i++` when the index is unused
- `bloop`: prefer `b.Loop()` over `for i := 0; i < b.N; i++` in benchmarks
- `unusedparams`, `unusedwrite`, `unusedfunc`: clean these in code you touch
- **Use format verbs precisely** - Use `%s` for strings and `%d` for integers, not `%v` for everything
- **Structured logging** - New log messages should use structured logging with `slog`-style fields (e.g., `log.Warn("message", slog.Any("error", err))`) rather than printf-style logging with format strings
- **Reuse existing helpers** - Before writing new parsing/validation code, check for existing utilities (e.g., `sqlerror` package for MySQL error codes, `mysqlctl.ParseVersionString()`, `strings.Split()`, `topoproto.TabletAliasString()` for formatting tablet aliases)
Expand Down
20 changes: 20 additions & 0 deletions changelog/25.0/25.0.0/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

- **[Major Changes](#major-changes)**
- **[New Support](#new-support)**
- [Experimental parallel VReplication applier](#vreplication-parallel-applier)
- **[Breaking Changes](#breaking-changes)**
- [`--watch-replication-stream` flag removed](#vttablet-watch-replication-stream-removed)
- [Snapshot Topology feature removed](#vtorc-snapshot-topology-removed)
Expand All @@ -15,6 +16,8 @@
- **[Minor Changes](#minor-changes)**
- **[VReplication](#minor-changes-vreplication)**
- [Default data protection for `_reverse` workflow cancel/complete](#vreplication-reverse-workflow-data-protection)
- [Unknown VStream event types are now hard errors in the applier](#vreplication-unknown-event-error)
- [Workflow config overrides sent to source tablets are now allowlisted](#vreplication-source-overrides-allowlist)
- **[VTGate](#minor-changes-vtgate)**
- [New controls for cross-keyspace reads](#vtgate-cross-keyspace-reads)
- **[VTTablet](#minor-changes-vttablet)**
Expand All @@ -26,6 +29,15 @@

### <a id="new-support"/>New Support</a>

#### <a id="vreplication-parallel-applier"/>Experimental parallel VReplication applier</a>

> [!WARNING]
> This feature is experimental.

VReplication can now apply binlog events using multiple concurrent MySQL connections instead of a single serial connection. Set `--vreplication-parallel-replication-workers=N` (default `1` = serial, maximum `64`) on `vttablet`, or the `vreplication-parallel-replication-workers` per-workflow config override, to dispatch non-conflicting transactions to `N` worker goroutines during the replication (running) phase. Conflicts are detected with target-side writeset hashing (primary key, unique key, and foreign key values — similar to MySQL's own `WRITESET` dependency tracking), so it works regardless of the source's `binlog_transaction_dependency_tracking` setting. Commits remain strictly ordered, so the workflow position, lag metrics, and `WaitForPos` semantics are unchanged. Transactions the conflict detector cannot reason about (DDL, statement-based events, partial row images, prefix/expression unique indexes, and similar) fall back to serial application.

Note that each worker holds two MySQL connections, so a workflow with `N` workers uses `2N+2` target-side connections.

### <a id="breaking-changes"/>Breaking Changes</a>

#### <a id="vttablet-watch-replication-stream-removed"/>`--watch-replication-stream` flag removed</a>
Expand Down Expand Up @@ -84,6 +96,14 @@ When calling `cancel` or `complete` on an auto-generated `_reverse` workflow wit

The `--keep-data` flag help text has been updated to note this default explicitly. This change applies to MoveTables, Reshard, and other VReplication workflow types that use the shared cancel/complete paths.

#### <a id="vreplication-unknown-event-error"/>Unknown VStream event types are now hard errors in the applier</a>

The VReplication applier previously ignored VStream event types it did not recognize. It now fails the workflow with an error for unknown event types (and unknown `on-ddl` actions), failing closed instead of silently skipping events. All event types produced by supported Vitess versions are handled; this only affects streams from sources emitting event types unknown to the target's version.

#### <a id="vreplication-source-overrides-allowlist"/>Workflow config overrides sent to source tablets are now allowlisted</a>

When a workflow has per-workflow config overrides, the target now sends only the source-relevant subset (packet size, timeouts, experimental flags, and similar) to the source tablet's VStreamer instead of the full override map. This keeps newer target-only override keys from failing workflows whose source tablets run an older version that rejects unknown keys.

See [#19906](https://github.com/vitessio/vitess/pull/19906) for details.

### <a id="minor-changes-vtgate"/>VTGate</a>
Expand Down
134 changes: 134 additions & 0 deletions examples/benchmark/bench_compare.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
#!/bin/bash

# A/B comparison: serial (workers=1) vs parallel (workers=4) VReplication applier
# with mixed write workload (INSERT/UPDATE/DELETE/bulk operations).

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR" || exit 1

ROW_COUNT=${ROW_COUNT:-200000}
SEED_ROWS=${SEED_ROWS:-10000}
RUN_ORDER=${RUN_ORDER:-random}
export ROW_COUNT SEED_ROWS

echo "============================================"
echo " VReplication Parallel Applier Benchmark"
echo " ROW_COUNT=$ROW_COUNT SEED_ROWS=$SEED_ROWS"
echo "============================================"
echo ""

run_bench() {
local workers=$1
local label=$2

echo ">>> Run: $label (PARALLEL_WORKERS=$workers) <<<"
echo ""

# Teardown any previous state
(cd "$SCRIPT_DIR/../local" && ./501_teardown.sh) 2>/dev/null

# Setup cluster with specified worker count
PARALLEL_WORKERS=$workers ./bench_setup.sh || { echo "FAILED: setup for $label"; return 1; }

# Run benchmark. Use pipefail so a bench_run.sh validation failure is not
# masked by tee's zero exit status.
(
set -o pipefail
./bench_run.sh 2>&1 | tee "/tmp/bench_${workers}_workers.log"
) || { echo "FAILED: bench_run for $label (validation or drain failure)"; return 1; }

echo ""
echo ">>> $label complete <<<"
echo ""
}

case "$RUN_ORDER" in
serial-first)
first_workers=1
first_label="Serial (1 worker)"
second_workers=4
second_label="Parallel (4 workers)"
;;
parallel-first)
first_workers=4
first_label="Parallel (4 workers)"
second_workers=1
second_label="Serial (1 worker)"
;;
random)
if (( RANDOM % 2 == 0 )); then
first_workers=1
first_label="Serial (1 worker)"
second_workers=4
second_label="Parallel (4 workers)"
RUN_ORDER=serial-first
else
first_workers=4
first_label="Parallel (4 workers)"
second_workers=1
second_label="Serial (1 worker)"
RUN_ORDER=parallel-first
fi
;;
*)
echo "Invalid RUN_ORDER: $RUN_ORDER"
exit 1
;;
esac

echo "Run order: $RUN_ORDER"

# Run 1
run_bench "$first_workers" "$first_label" || exit 1

# Teardown between runs
echo "Tearing down between runs..."
(cd "$SCRIPT_DIR/../local" && ./501_teardown.sh) 2>/dev/null
sleep 3

# Run 2
run_bench "$second_workers" "$second_label" || exit 1

# Teardown after
echo "Tearing down after benchmark..."
(cd "$SCRIPT_DIR/../local" && ./501_teardown.sh) 2>/dev/null

# Compare results
echo ""
echo "============================================"
echo " COMPARISON"
echo "============================================"

for workers in 1 4; do
logfile="/tmp/bench_${workers}_workers.log"
if [[ -f "$logfile" ]]; then
echo ""
echo "--- Workers=$workers ---"
grep -E "(Drain time|Throughput|Backlog ops|Seed rows)" "$logfile"
fi
done

# Calculate speedup if both logs exist
serial_log="/tmp/bench_1_workers.log"
parallel_log="/tmp/bench_4_workers.log"
if [[ -f "$serial_log" ]] && [[ -f "$parallel_log" ]]; then
serial_time=$(grep "Drain time" "$serial_log" | grep -o '[0-9]*')
parallel_time=$(grep "Drain time" "$parallel_log" | grep -o '[0-9]*')
if [[ -n "$serial_time" ]] && [[ -n "$parallel_time" ]] && [[ "$parallel_time" -gt 0 ]]; then
# Integer math: multiply by 100 for 2 decimal places
speedup_x100=$((serial_time * 100 / parallel_time))
speedup_whole=$((speedup_x100 / 100))
speedup_frac=$((speedup_x100 % 100))
printf -v speedup_str '%d.%02d' "$speedup_whole" "$speedup_frac"
echo ""
echo "--- Speedup ---"
echo " Serial: ${serial_time}s"
echo " Parallel: ${parallel_time}s"
echo " Speedup: ${speedup_str}x"
fi
fi

echo ""
echo "============================================"
echo "Full logs: /tmp/bench_1_workers.log and /tmp/bench_4_workers.log"
echo "============================================"
Loading
Loading