Skip to content

fix: /v3/health use canonical stacks height header #6347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

Jiloc
Copy link
Contributor

@Jiloc Jiloc commented Aug 5, 2025

This PR builds on top of #6344 and will remain in review until that one is merged.

Previously, the /v3/health endpoint relied on data from NakamotoDownloadStateMachine, which is consumed after evaluation. This led to inconsistent behavior: during syncing, the endpoint worked correctly, but once the node reached chaintip, the necessary data was often already consumed, resulting in missing neighbor info and a 500 error.

This PR introduces a new approach builds on top of the changes in #6344, which ensure that all RPC endpoints include the X-Canonical-Stacks-Tip-Height header in their responses. Now, the node caches the highest tip received from neighbors during normal operations and uses that as the reference point for the /v3/health endpoint.

Key changes:

  • Removed old logic and associated functions like get_max_stacks_height_of_neighbors.
  • Instead of adding a dedicated 10-minute integration test with multiple nodes, I extended the existing multiple_miner test. This ensures that ConversationHttp::chat is called and that the peer height is updated correctly during normal operation.
  • Updated StacksNodeState::canonical_stacks_tip_height to return PeerNetwork::stacks_tip.height instead of PeerNetwork::burnchain_tip.stacks_block_height.

To validate the reliability, I analyzed the logs from a mainnet node at chaintip over the past 24 hours. I measured how often it receives responses from peers

--- Request Time Interval Analysis (Last 24 Hours) ---
Average Time: 6.4775 seconds
Minimum Time: 0.0002 seconds
Maximum Time: 55.2894 seconds
Standard Deviation: 8.1859
Total Intervals Found: 13,440

This suggests the new method won't always have the absolute latest tip but provides a close and stable approximation.

Applicable issues

Additional info (benefits, drawbacks, caveats)

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo
  • New integration test(s) added to bitcoin-tests.yml

@Jiloc Jiloc added this to the 3.2.0.0.1 milestone Aug 5, 2025
@Jiloc Jiloc self-assigned this Aug 5, 2025
@Jiloc Jiloc moved this to Status: 💻 In Progress in Stacks Core Eng Aug 5, 2025
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 94.35028% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.79%. Comparing base (799953e) to head (87f9bba).
⚠️ Report is 19 commits behind head on develop.

Files with missing lines Patch % Lines
stackslib/src/net/mod.rs 84.00% 4 Missing ⚠️
stackslib/src/net/api/tests/gethealth.rs 84.61% 2 Missing ⚠️
stackslib/src/net/api/postblock.rs 0.00% 1 Missing ⚠️
stackslib/src/net/download/epoch2x.rs 66.66% 1 Missing ⚠️
...download/nakamoto/tenure_downloader_unconfirmed.rs 0.00% 1 Missing ⚠️
stackslib/src/net/tests/httpcore.rs 97.82% 1 Missing ⚠️

❌ Your project check has failed because the head coverage (75.79%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (799953e) and HEAD (87f9bba). Click for more details.

HEAD has 31 uploads less than BASE
Flag BASE (799953e) HEAD (87f9bba)
124 93
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6347      +/-   ##
===========================================
- Coverage    81.00%   75.79%   -5.21%     
===========================================
  Files          541      552      +11     
  Lines       347855   351029    +3174     
===========================================
- Hits        281772   266079   -15693     
- Misses       66083    84950   +18867     
Files with missing lines Coverage Δ
stacks-node/src/tests/nakamoto_integrations.rs 55.16% <100.00%> (-12.76%) ⬇️
stacks-node/src/tests/neon_integrations.rs 37.78% <100.00%> (-11.69%) ⬇️
stackslib/src/chainstate/stacks/db/mod.rs 83.61% <ø> (-0.93%) ⬇️
stackslib/src/net/api/callreadonly.rs 93.24% <100.00%> (-0.04%) ⬇️
stackslib/src/net/api/fastcallreadonly.rs 91.70% <100.00%> (-0.04%) ⬇️
stackslib/src/net/api/get_tenures_fork_info.rs 69.14% <ø> (ø)
stackslib/src/net/api/getaccount.rs 91.41% <100.00%> (-0.06%) ⬇️
stackslib/src/net/api/getattachment.rs 96.73% <100.00%> (-0.04%) ⬇️
stackslib/src/net/api/getattachmentsinv.rs 79.61% <100.00%> (-0.13%) ⬇️
stackslib/src/net/api/getblock.rs 81.54% <ø> (-1.79%) ⬇️
... and 70 more

... and 229 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 799953e...87f9bba. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +696 to +697
pub fn canonical_stacks_tip_height(&mut self) -> u64 {
self.with_node_state(|network, _, _, _, _| network.stacks_tip.height)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcnelson I initially used network.burnchain_tip.canonical_stacks_tip_height here, as you suggested. However, during integration tests (e.g. multiple_miners), I found out that burnchain_tip.canonical_stacks_tip_height always lagged behind. in the multiple_miners tests it always stops at 121 compared to stacks_tip.height that arrives at 128 at the end of the test.
I must be missing a step in the test to properly trigger the refresh.

For now, I’ve used network.stacks_tip.height, as it matches the expected value and is also used in /v2/info.

I'm absolutely open to reverting this to use burnchain_tip if someone has insight on how to consistently trigger its update in integration tests.

@Jiloc Jiloc linked an issue Aug 6, 2025 that may be closed by this pull request
@aldur aldur requested a review from jcnelson August 7, 2025 14:37
@aldur aldur moved this from Status: 💻 In Progress to Status: In Review in Stacks Core Eng Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Status: In Review
Development

Successfully merging this pull request may close these issues.

[Bug] /v3/health always fails with HTTP 500
1 participant