Skip to content

fix: Respect tx-pool suspend commands before popping verify tasks#5237

Open
Officeyutong wants to merge 3 commits into
nervosnetwork:developfrom
Officeyutong:fix-security-advisory-verify-worker-can-ignore-tx-pool-suspend-command
Open

fix: Respect tx-pool suspend commands before popping verify tasks#5237
Officeyutong wants to merge 3 commits into
nervosnetwork:developfrom
Officeyutong:fix-security-advisory-verify-worker-can-ignore-tx-pool-suspend-command

Conversation

@Officeyutong

Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Tx-pool verify workers cache the current chunk command in self.status, but that cached status was only updated when the outer tokio::select! received a command notification. If a queue notification triggered processing first, or if a suspend command arrived while the worker was already inside process_inner(), the worker could keep looping with a stale Resume status and continue popping queued transactions. This weakened the suspend mechanism used during block processing and could let remote transaction traffic keep verification workers consuming CPU.

What is changed and how it works?

Refresh the worker command status from the watch receiver before each process_inner() iteration and again before popping a transaction from the verify queue. The command notification branch now uses borrow_and_update() to keep the receiver state consistent. With this change, a worker observes the latest Suspend command before taking more queued verification work.

Related changes

  • PR to update owner/repo:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
  • Breaking backward compatibility

@Officeyutong Officeyutong marked this pull request as ready for review June 10, 2026 08:45
@Officeyutong Officeyutong requested a review from a team as a code owner June 10, 2026 08:45
@Officeyutong Officeyutong requested review from zhangsoledad and removed request for a team June 10, 2026 08:45
@chainTe

chainTe commented Jun 11, 2026

Copy link
Copy Markdown

Security Review

Result: 1 confirmed finding
Reviewed range: 8f6cacf976899b324b2d23a24e5018b54f3502ec..cc67ffdf6825a4dd0972f9160204afb4c0b32a08

Validation:

  • Existing broad tests were not run locally; CI is expected to cover them.
  • Reviewed tx-pool/src/verify_mgr.rs and traced the supporting tx-pool command, queue, cached verification, and chain suspend/resume paths.

Notes:

  • The patch narrows the stale-status window, but a suspend command can still arrive after the final status refresh and before the worker removes a queued transaction. If that popped transaction has a verify-cache hit, the cached verify_rtx path does not consult command_rx before _process_tx reaches submit_entry, so one transaction can still be submitted during the intended suspend window.
  • Please recheck the command after acquiring the queue write lock and before pop_front, and make cached/fast _process_tx paths honor a pending suspend before submission without dropping a popped transaction.

Comment thread tx-pool/src/verify_mgr.rs
return;
}

self.refresh_status();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this 4 lines of change is redundant, since we have line of 111 and 116 in same block.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we should remove line 111 and 116 check?

@eval-exec eval-exec Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move L111 to L115?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the first status refresh after the exit-signal check. I kept the second refresh/check before pop_front, because is_empty().await can yield and a suspend command may arrive during that await; we need to re-read the watch state before taking a transaction from the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants