Skip to content

[release-22.0] OnlineDDL: always close lock connection (#19586)#19720

Merged
timvaillancourt merged 2 commits into
release-22.0from
backport-19586-to-release-22.0
Mar 27, 2026
Merged

[release-22.0] OnlineDDL: always close lock connection (#19586)#19720
timvaillancourt merged 2 commits into
release-22.0from
backport-19586-to-release-22.0

Conversation

@vitess-bot
Copy link
Copy Markdown
Contributor

@vitess-bot vitess-bot Bot commented Mar 26, 2026

Description

This is a backport of #19586

@vitess-bot vitess-bot Bot added Type: Bug Type: Enhancement Logical improvement (somewhere between a bug and feature) Backport This is a backport labels Mar 26, 2026
Copilot AI review requested due to automatic review settings March 26, 2026 15:55
@vitess-bot vitess-bot Bot added Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Skip CI Skip CI actions from running Merge Conflict labels Mar 26, 2026
@vitess-bot vitess-bot Bot review requested due to automatic review settings March 26, 2026 15:55
@vitess-bot
Copy link
Copy Markdown
Contributor Author

vitess-bot Bot commented Mar 26, 2026

Hello @timvaillancourt, there are conflicts in this backport.

Please address them in order to merge this Pull Request. You can execute the snippet below to reset your branch and resolve the conflict manually.

Make sure you replace origin by the name of the vitessio/vitess remote

git fetch --all
gh pr checkout 19720
git reset --hard origin/release-22.0
git cherry-pick -m 1 39079bbffbbd28e4b64bf20095a6a7fc23daea3e

@vitess-bot vitess-bot Bot added Type: Bug Type: Enhancement Logical improvement (somewhere between a bug and feature) Backport This is a backport Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Skip CI Skip CI actions from running Merge Conflict labels Mar 26, 2026
@github-actions github-actions Bot added this to the v22.0.5 milestone Mar 26, 2026
@timvaillancourt timvaillancourt self-assigned this Mar 26, 2026
@timvaillancourt timvaillancourt removed Skip CI Skip CI actions from running Merge Conflict labels Mar 26, 2026
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Copilot AI review requested due to automatic review settings March 26, 2026 17:37
@timvaillancourt
Copy link
Copy Markdown
Contributor

Resolved conflicts for backport of #19586.

Conflict: go/vt/vttablet/onlineddl/executor.go — same conflict as #19721. The initConnectionLockWaitTimeout function had already been refactored into initConnectionSessionTimeout on release-22.0 (by a prior backport of #19630). Resolution: took the cherry-pick side to add the new initConnectionLockWaitTimeout wrapper and initDBConnectionLockWaitTimeout function.

Additional fix: replaced log.Warn(fmt.Sprintf(...)) with log.Warningf(...) to match the release branch's logging API.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backport to release-22.0 of an OnlineDDL cutover safety fix to reduce the risk of orphaned LOCK TABLES blocking traffic by ensuring the lock-connection is actively released/closed during cutover teardown.

Changes:

  • Add a lock-connection teardown defer that attempts UNLOCK TABLES and then closes the connection via Kill(...).
  • Refactor lock-wait timeout handling by introducing initConnectionSessionTimeout(...) and delegating initConnectionLockWaitTimeout(...) to it.
  • Minor formatting/cleanup adjustments in executor.go.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +884 to +895
defer func() {
// Always attempt UNLOCK TABLES first, as it releases locks immediately on this
// connection. Then kill the connection as a fallback to guarantee any held locks
// are released, even if UNLOCK TABLES were to fail.
unlockCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if _, err := lockConn.Conn.Exec(unlockCtx, sqlUnlockTables, 1, false); err != nil {
log.Warningf("Failed to UNLOCK TABLES in OnlineDDL migration %s: %v", onlineDDL.UUID, err)
}
if err := lockConn.Conn.Kill("closing lock tables connection", 0); err != nil {
log.Warningf("Failed to kill lock tables connection in OnlineDDL migration %s: %v", onlineDDL.UUID, err)
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defer comment says killing the lock connection is a fallback if UNLOCK TABLES fails, but the code calls lockConn.Conn.Kill(...) unconditionally. Either make the kill conditional (e.g., only if UNLOCK TABLES errors) or adjust the comment to reflect the intended always-kill behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +1208 to +1223
func (e *Executor) initConnectionSessionTimeout(ctx context.Context, conn *connpool.Conn, variable string, timeout time.Duration) (deferFunc func(), err error) {
deferFunc = func() {}

if _, err := conn.Exec(ctx, `set @lock_wait_timeout=@@session.lock_wait_timeout`, 0, false); err != nil {
return deferFunc, vterrors.Errorf(vtrpcpb.Code_UNKNOWN, "could not read lock_wait_timeout: %v", err)
saveQuery, err := sqlparser.ParseAndBind(
fmt.Sprintf("set @%s=@@session.%s", variable, variable),
)
if err != nil {
return deferFunc, err
}
if _, err := conn.Exec(ctx, saveQuery, 0, false); err != nil {
return deferFunc, vterrors.Wrapf(err, "could not read %s", variable)
}
setQuery, err := sqlparser.ParseAndBind(
fmt.Sprintf("set @@session.%s=%%a", variable),
sqltypes.Int64BindVariable(int64(timeout.Seconds())),
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initConnectionSessionTimeout builds SET statements by interpolating variable into the SQL (e.g. @@session.%s). Since this is not bindable, it would be safer to validate/whitelist allowed variable names inside this helper to prevent accidental misuse (or future injection risks) if it ever gets called with non-constant input.

Copilot uses AI. Check for mistakes.
return deferFunc, err
}
deferFunc = func() {
conn.Exec(ctx, restoreQuery, 0, false)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore closure returned by initConnectionSessionTimeout uses the caller's ctx when executing the restore query. If ctx is canceled/expired during unwind, the restore will be skipped and the modified session variable may leak into a pooled connection (notably for renameConn when renameWasSuccessful is true and the connection is recycled). Consider restoring using a fresh context.WithTimeout(context.Background(), ...) and (optionally) logging restore failures.

Suggested change
conn.Exec(ctx, restoreQuery, 0, false)
restoreCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if _, err := conn.Exec(restoreCtx, restoreQuery, 0, false); err != nil {
log.Warningf("initConnectionSessionTimeout: failed to restore session variable %s: %v", variable, err)
}

Copilot uses AI. Check for mistakes.
// fixCompletedTimestampDone fixes a nil `completed_timestamp` columns, see
// https://github.com/vitessio/vitess/issues/13927
// The fix is in release-18.0
// TODO: remove in release-19.0
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO references removing the workaround in release-19.0, but this is the release-22.0 branch. Since these lines are being touched, it would help to either update the TODO to the correct target (or explain why it still needs to exist) to avoid misleading future maintainers.

Suggested change
// TODO: remove in release-19.0
// TODO: remove once it is safe to assume all clusters have been upgraded past release-18.0

Copilot uses AI. Check for mistakes.
@timvaillancourt timvaillancourt marked this pull request as ready for review March 26, 2026 18:39
@timvaillancourt timvaillancourt enabled auto-merge (squash) March 26, 2026 18:39
@timvaillancourt timvaillancourt merged commit 3e94a98 into release-22.0 Mar 27, 2026
110 of 113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backport This is a backport Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants