Skip to content

hard-invalidate region on RegionNotFound instead of async reload#1916

Open
mittalrishabh wants to merge 2 commits intotikv:masterfrom
mittalrishabh:rish_no_region
Open

hard-invalidate region on RegionNotFound instead of async reload#1916
mittalrishabh wants to merge 2 commits intotikv:masterfrom
mittalrishabh:rish_no_region

Conversation

@mittalrishabh
Copy link
Member

@mittalrishabh mittalrishabh commented Mar 17, 2026

issue #1892

The original fix does not work because in-flight requests continue to fail with region-not-found errors. This change hard-invalidates the region so that concurrent requests stop using the stale cache, and introduces a regionInvalidatedForRetry flag to allow the current request to retry on the leader before giving up.

Unit test

Deployed in prod to verify that it fixed region-not-found errors

Summary by CodeRabbit

  • Bug Fixes
    • Improved region cache invalidation to be synchronous instead of asynchronous, ensuring more immediate consistency when region states change.
    • Enhanced retry logic to allow requests to attempt on the leader even after hard region invalidation, improving reliability in region-not-found scenarios.

mittalrishabh and others added 2 commits March 17, 2026 09:36
…nc reload

Replace AsyncInvalidateCachedRegion with InvalidateCachedRegion in the
onRegionNotFound path so that concurrent requests immediately stop using
the stale region. The current request's selector is allowed one more
leader retry via the new regionInvalidatedForRetry flag, which bypasses
the validity check in next().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: rishabh mittal <mittalrishabh@gmail.com>
Signed-off-by: rishabh mittal <mittalrishabh@gmail.com>
@ti-chi-bot ti-chi-bot bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label Mar 17, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This PR removes the AsyncInvalidateCachedRegion method from RegionCache and replaces asynchronous region invalidation with synchronous invalidation throughout the replica selector. A new regionInvalidatedForRetry flag enables leader retry attempts after hard invalidation, changing the invalidation behavior from async reload to sync invalidation with explicit retry control.

Changes

Cohort / File(s) Summary
Region Cache Method Removal
internal/locate/region_cache.go
Removed the public AsyncInvalidateCachedRegion method that previously enabled asynchronous reload on next access by setting sync flags.
Replica Selector Retry Logic
internal/locate/replica_selector.go
Added regionInvalidatedForRetry field to track hard invalidation state and allow leader retry. Replaced AsyncInvalidateCachedRegion calls with synchronous InvalidateCachedRegion, enabling controlled retry behavior in next() and onRegionNotFound() paths.
Test Updates
internal/locate/region_request_state_test.go, internal/locate/replica_selector_test.go
Updated test expectations to reflect hard invalidation instead of asynchronous reload; follower regions now transition to invalid state with comments clarifying the hard-invalidation-with-retry behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • you06
  • zyguan

Poem

🐰 Out with the async, in with the sync,
Region retries now quick in a blink,
Hard invalidation leads the way,
Leader retries save the day,
Simpler paths for cache to play!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing asynchronous region reload with hard invalidation when RegionNotFound occurs.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 17, 2026
@rpaik
Copy link

rpaik commented Mar 17, 2026

@you06 and @zyguan could you help with the review? 🙏

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 18, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 18, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: you06, zyguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 18, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 18, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-18 07:46:35.954736172 +0000 UTC m=+343123.042393699: ☑️ agreed by zyguan.
  • 2026-03-18 13:50:26.908380601 +0000 UTC m=+364953.996038128: ☑️ agreed by you06.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants