Skip to content

*: disable async commit in production to make sure the right commit ts of active-active replication#1912

Open
lcwangchao wants to merge 3 commits intotikv:feature/active-active-85from
lcwangchao:disable_async_commit
Open

*: disable async commit in production to make sure the right commit ts of active-active replication#1912
lcwangchao wants to merge 3 commits intotikv:feature/active-active-85from
lcwangchao:disable_async_commit

Conversation

@lcwangchao
Copy link
Contributor

@lcwangchao lcwangchao commented Mar 16, 2026

Summary

ref: pingcap/tidb#64281

This PR makes client-go safer for active-active replication in production, while keeping the existing async commit / 1PC test coverage intact.

  1. Revert the PD dependency change in go.mod so the behavior stays compatible with TiDB/TiKV 8.5.
    This is aligned with the discussion in PD issue #10427, where PD is moving toward ensuring TSO uniqueness across clusters via suffix bits rather than the previous local/global TSO behavior.

  2. Force disable async commit and 1PC in production.
    The reason is that in active-active scenarios, async commit and 1PC may derive commit TS from maxReadTS + 1 instead of getting it directly from PD, which can violate the PD tso-unique-index constraint and potentially produce invalid commit TS behavior across clusters.

  3. Keep async commit and 1PC available in test environments.
    Existing async commit / 1PC tests still need to verify those code paths, so this PR adds a test-only switch to re-enable them when building test stores.

  4. Add coverage for the default behavior.
    This PR adds tests to verify that by default async commit and 1PC are disabled, even if a transaction explicitly calls SetEnableAsyncCommit(true) or SetEnable1PC(true).

Summary by CodeRabbit

  • Chores

    • CI workflows now run on feature/* branches.
    • PD client dependency version updated across modules.
  • New Features

    • Added a runtime toggle to disable active-active commit support for test scenarios and a probe to inspect one-phase commit behavior.
  • Tests

    • New and extended integration tests to validate default commit protocol behavior and related scenarios.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nrc for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label Mar 16, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ee04180-20f5-46da-8130-05f79945fd93

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Extends CI triggers to include feature/* branches, updates github.com/tikv/pd/client version across modules, and adds a KVStore runtime toggle to gate active-active commit behavior with corresponding changes in 2PC/1PC decision points and test utilities.

Changes

Cohort / File(s) Summary
CI Workflows
.github/workflows/integration.yml, .github/workflows/test.yml
Added feature/* branch patterns to workflow triggers for push and pull_request.
Dependency Updates
go.mod, examples/gcworker/go.mod, examples/rawkv/go.mod, examples/txnkv/go.mod, examples/txnkv/1pc_txn/go.mod, examples/txnkv/async_commit/go.mod, examples/txnkv/delete_range/go.mod, examples/txnkv/pessimistic_txn/go.mod, examples/txnkv/unsafedestoryrange/go.mod, integration_tests/go.mod
Bumped/downgraded github.com/tikv/pd/client indirect version to v0.0.0-20251219084741-029eb6e7d5d0 across modules.
KVStore Runtime Flag & API
tikv/kv.go
Added disableActiveActiveCommitSupport field to KVStore and public accessor IsActiveActiveCommitSupportDisabled().
2PC / 1PC Decision Logic
txnkv/transaction/2pc.go
Gated checkAsyncCommit() and checkOnePC() behind KVStore capability check, making commit-path decisions consult the new flag.
Test Helpers & Options
tikv/test_util.go, integration_tests/util_test.go
Added WithDisableActiveActiveCommitSupportForTest() option and helper to apply it; test store constructors updated to pass this option.
Test Probe API
txnkv/transaction/test_probe.go
Added public CheckOnePC() on CommitterProbe to expose 1PC decision for tests.
Integration Tests
integration_tests/2pc_test.go, integration_tests/store_test.go
Updated tests to use the new test option; added TestEnableActiveActiveCommitByDefault() to exercise async-commit/1PC behavior and verify commit path.
sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant Client
    end
    rect rgba(200,255,200,0.5)
    participant KVStore
    end
    rect rgba(255,200,200,0.5)
    participant Committer
    participant PD
    end

    Client->>KVStore: Commit request
    KVStore->>KVStore: check IsActiveActiveCommitSupportDisabled()
    alt flag allows active-active path
        KVStore->>Committer: attempt async-commit / 1PC flow
        Committer->>PD: Get timestamp / PD interactions
        Committer->>KVStore: commit
    else flag disables active-active path
        KVStore->>Committer: use standard 2PC flow
        Committer->>PD: Get timestamp / PD interactions
        Committer->>KVStore: commit
    end
    KVStore->>Client: Commit result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • cfzjywxk
  • tiancaiamao
  • ekexium

Poem

🐰 I hopped through branches, ci blossoms bright,
PD versions tuned just right,
A toggle hides the active-active gate,
Tests hop in to validate,
Thump-thump—commits march on tonight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: disabling async commit in production for active-active replication commit timestamp correctness, which aligns with the core objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 16, 2026
@lcwangchao lcwangchao force-pushed the disable_async_commit branch 2 times, most recently from 6563410 to 3a99d28 Compare March 16, 2026 09:57
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 16, 2026
@lcwangchao lcwangchao force-pushed the disable_async_commit branch 3 times, most recently from 43201b6 to 2aa1a7f Compare March 16, 2026 11:36
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 16, 2026
@lcwangchao lcwangchao force-pushed the disable_async_commit branch 2 times, most recently from 31fbda2 to 4582dbc Compare March 16, 2026 11:48
Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
@lcwangchao lcwangchao force-pushed the disable_async_commit branch 2 times, most recently from 6558ff1 to 10ceb01 Compare March 16, 2026 11:58
@lcwangchao lcwangchao changed the title *: disable async commit *: disable async commit in production to make sure the right commit ts of active-active replication Mar 16, 2026
@lcwangchao
Copy link
Contributor Author

/retest

@lcwangchao
Copy link
Contributor Author

@coderabbitai review this PR

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

@lcwangchao Sure, I'll review this PR right away!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tikv/kv.go (1)

125-128: Consider clarifying the comment to reduce confusion.

The field name disableActiveActiveCommitSupport combined with "default value is false, which disables the async-commit" creates a double-negative that's hard to follow. Consider rephrasing:

-	// disableActiveActiveCommitSupport indicates whether to disable active-active commit support.
-	// The default value is false, which disables the async-commit and 1PC to ensure the commit timestamp should
-	// always follow the constraint by PD settings `tso-unique-index`.
+	// disableActiveActiveCommitSupport when true, disables the active-active replication constraint,
+	// enabling async-commit and 1PC for test purposes. When false (default/production), the constraint
+	// is enforced, disabling async-commit and 1PC to ensure commit timestamps follow the PD `tso-unique-index` setting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tikv/kv.go` around lines 125 - 128, The comment for
disableActiveActiveCommitSupport is confusing due to a double-negative; rewrite
it to clearly state the default and the effect when the field is true or false
(e.g., "disableActiveActiveCommitSupport controls whether active-active commit
features (async-commit and 1PC) are disabled. Default false: async-commit and
1PC are enabled. When true: async-commit and 1PC are disabled to enforce PD's
tso-unique-index constraints."). Update the comment above the
disableActiveActiveCommitSupport field in tikv/kv.go to this clearer wording (or
equivalent) so readers can immediately understand the default and the behavior
for true/false.
txnkv/transaction/2pc.go (1)

124-127: Clarify this safety flag before more call sites depend on it.

IsActiveActiveCommitSupportDisabled() reads like true means "feature off", but both gates below only permit async commit/1PC when it returns true. That double negative is easy to wire backwards in another kvstore implementation. A direct name/comment around the real policy (allow legacy async/1PC in tests, etc.) would be much safer.

Also applies to: 1511-1513, 1546-1548

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@txnkv/transaction/2pc.go` around lines 124 - 127, The method
IsActiveActiveCommitSupportDisabled() is named and documented with a
double-negative that causes callers to invert logic incorrectly; rename and
re-document it to directly express the policy (e.g., AllowAsyncCommitAnd1PC or
EnableAsyncCommitAnd1PC) and update all call sites in this file where async
commit/1PC are gated (references to IsActiveActiveCommitSupportDisabled() around
the async-commit/1PC checks) so that the gate reads naturally (true => allow
async-commit/1PC). Ensure the new name and comment state the actual intent (for
example: "AllowAsyncCommitAnd1PC indicates tests/legacy environments may permit
async-commit and 1PC despite PD tso-unique-index constraints") and keep behavior
identical except for the renamed API and inverted boolean logic in callers as
needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/2pc_test.go`:
- Around line 101-104: The test is calling the public escape hatch
WithDisableActiveActiveCommitSupportForTest via tikv.NewKVStore; move/remove
this test-only option from the non-test public API: relocate the
WithDisableActiveActiveCommitSupportForTest implementation into a _test or
test-only file (e.g., tikv test utilities) and/or make it unexported, then
update integration_tests/2pc_test.go to use the test-only helper or construct
the store without that option so production code cannot opt out; ensure
references to WithDisableActiveActiveCommitSupportForTest are removed from
non-test code and only present in test files.

---

Nitpick comments:
In `@tikv/kv.go`:
- Around line 125-128: The comment for disableActiveActiveCommitSupport is
confusing due to a double-negative; rewrite it to clearly state the default and
the effect when the field is true or false (e.g.,
"disableActiveActiveCommitSupport controls whether active-active commit features
(async-commit and 1PC) are disabled. Default false: async-commit and 1PC are
enabled. When true: async-commit and 1PC are disabled to enforce PD's
tso-unique-index constraints."). Update the comment above the
disableActiveActiveCommitSupport field in tikv/kv.go to this clearer wording (or
equivalent) so readers can immediately understand the default and the behavior
for true/false.

In `@txnkv/transaction/2pc.go`:
- Around line 124-127: The method IsActiveActiveCommitSupportDisabled() is named
and documented with a double-negative that causes callers to invert logic
incorrectly; rename and re-document it to directly express the policy (e.g.,
AllowAsyncCommitAnd1PC or EnableAsyncCommitAnd1PC) and update all call sites in
this file where async commit/1PC are gated (references to
IsActiveActiveCommitSupportDisabled() around the async-commit/1PC checks) so
that the gate reads naturally (true => allow async-commit/1PC). Ensure the new
name and comment state the actual intent (for example: "AllowAsyncCommitAnd1PC
indicates tests/legacy environments may permit async-commit and 1PC despite PD
tso-unique-index constraints") and keep behavior identical except for the
renamed API and inverted boolean logic in callers as needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8cc297e2-733b-4298-ac14-277ba414f1ec

📥 Commits

Reviewing files that changed from the base of the PR and between 70d01fd and 10ceb01.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • integration_tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (19)
  • .github/workflows/integration.yml
  • .github/workflows/test.yml
  • examples/gcworker/go.mod
  • examples/rawkv/go.mod
  • examples/txnkv/1pc_txn/go.mod
  • examples/txnkv/async_commit/go.mod
  • examples/txnkv/delete_range/go.mod
  • examples/txnkv/go.mod
  • examples/txnkv/pessimistic_txn/go.mod
  • examples/txnkv/unsafedestoryrange/go.mod
  • go.mod
  • integration_tests/2pc_test.go
  • integration_tests/go.mod
  • integration_tests/store_test.go
  • integration_tests/util_test.go
  • tikv/kv.go
  • tikv/test_util.go
  • txnkv/transaction/2pc.go
  • txnkv/transaction/test_probe.go

This reverts commit 70d01fd.

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
@lcwangchao lcwangchao force-pushed the disable_async_commit branch from 10ceb01 to 9d5a3dc Compare March 16, 2026 12:34
@lcwangchao
Copy link
Contributor Author

@coderabbitai review this PR again

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

@lcwangchao Sure! I'll do a full review of this PR right away.

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tikv/kv.go (1)

125-128: Consider renaming to reduce double-negative confusion.

The current naming creates a confusing double-negative:

  • disableActiveActiveCommitSupport = false (default) → async commit/1PC are disabled
  • disableActiveActiveCommitSupport = true (tests) → async commit/1PC are enabled

A clearer name like enableAsyncCommitForTest or inverting to activeActiveCommitEnabled (default true to block async commit, false in tests to allow it) would make the code more readable.

The test option WithDisableActiveActiveCommitSupportForTest() is also confusing since it actually enables async commit for tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tikv/kv.go` around lines 125 - 128, The field
disableActiveActiveCommitSupport and its setter
WithDisableActiveActiveCommitSupportForTest() are causing double-negative
confusion; rename the boolean to a positive, clear name (e.g.,
enableAsyncCommitForTest or activeActiveCommitEnabled) and invert its
default/semantics so the identifier directly describes when async-commit/1PC are
allowed; update all references to disableActiveActiveCommitSupport (including
any checks, comments, and the WithDisableActiveActiveCommitSupportForTest()
method name/behavior) to the new name and semantics, and adjust tests that call
WithDisableActiveActiveCommitSupportForTest() so they use the new setter and
expected boolean value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tikv/test_util.go`:
- Around line 114-115: Fix the typo in the comment above the call to
disableActiveActiveCommitSupport(opt...) by replacing "exists tests" with
"existing tests" so the comment reads: "To make sure some existing tests with
async-commit or 1pc commit pass"; update the comment near the opt assignment to
reflect this corrected wording.

---

Nitpick comments:
In `@tikv/kv.go`:
- Around line 125-128: The field disableActiveActiveCommitSupport and its setter
WithDisableActiveActiveCommitSupportForTest() are causing double-negative
confusion; rename the boolean to a positive, clear name (e.g.,
enableAsyncCommitForTest or activeActiveCommitEnabled) and invert its
default/semantics so the identifier directly describes when async-commit/1PC are
allowed; update all references to disableActiveActiveCommitSupport (including
any checks, comments, and the WithDisableActiveActiveCommitSupportForTest()
method name/behavior) to the new name and semantics, and adjust tests that call
WithDisableActiveActiveCommitSupportForTest() so they use the new setter and
expected boolean value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0fb0623-da7b-4faf-b710-cf555fffa3b5

📥 Commits

Reviewing files that changed from the base of the PR and between 70d01fd and 9d5a3dc.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • integration_tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (19)
  • .github/workflows/integration.yml
  • .github/workflows/test.yml
  • examples/gcworker/go.mod
  • examples/rawkv/go.mod
  • examples/txnkv/1pc_txn/go.mod
  • examples/txnkv/async_commit/go.mod
  • examples/txnkv/delete_range/go.mod
  • examples/txnkv/go.mod
  • examples/txnkv/pessimistic_txn/go.mod
  • examples/txnkv/unsafedestoryrange/go.mod
  • go.mod
  • integration_tests/2pc_test.go
  • integration_tests/go.mod
  • integration_tests/store_test.go
  • integration_tests/util_test.go
  • tikv/kv.go
  • tikv/test_util.go
  • txnkv/transaction/2pc.go
  • txnkv/transaction/test_probe.go

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
@lcwangchao
Copy link
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant