Maintain peers across all data column subnets #7915

jimmygchen · 2025-08-21T13:06:02Z

Issue Addressed

Closes:

Changes extracted from earlier PR #7876

This PR fixes two main things with a few other improvements mentioned below:

Prevent Lighthouse from repeatedly sending DataColumnByRoot requests to an unsynced peer, causing lookup sync to get stuck
Allows Lighthouse to send discovery requests if there isn't enough synced peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted.

Proposed Changes

Make peer discovery queries if custody subnet peer count drops below the minimum threshold
Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2)
Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets
Optimise some of the PeerDB functions checking custody peers
Only send lookup requests to peers that are synced or advanced

…f custody subnet peers drop below the threshold. Optimise some peerdb functions.

mergify · 2025-08-21T13:49:14Z

Some required checks have failed. Could you please take a look @jimmygchen? 🙏

Squashed commit of the following: commit 6b1f2c8 Author: Jimmy Chen <[email protected]> Date: Fri Aug 22 00:53:11 2025 +1000 Fix function behaviour commit edf8571 Author: Jimmy Chen <[email protected]> Date: Fri Aug 22 00:46:12 2025 +1000 Remove brittle and unmaintainable test commit 232e685 Author: Jimmy Chen <[email protected]> Date: Fri Aug 22 00:20:03 2025 +1000 Prioritize unsynced peers for pruning commit 9e87e49 Author: Jimmy Chen <[email protected]> Date: Thu Aug 21 23:20:13 2025 +1000 Clean ups. commit 05baf9c Author: Jimmy Chen <[email protected]> Date: Thu Aug 21 23:04:13 2025 +1000 Maintain peers across all sampling subnets. Make discovery requests if custody subnet peers drop below the threshold. Optimise some peerdb functions.

AgeManning

I think this is an improvement, but we might want to make the data columns first class citizens.

The overall goal of this logic originally was to get Lighthouse to reach a steady-state where it never had to do any discoveries. It would find and maintain a uniform set of subnet peers.

There are two competing factors, discoveries which generate peers and pruning which remove the excess. If the pruning doesn't match what our discovery targets are, we might be in a perpetual state of discovering, then pruning the discovered peers.

Before data columns, we would discover peers if we needed them for attestation subnets, then prune down to maintain a uniform set of attestation subnets, to prevent any future discoveries.

With this change, we now have a new driving requirement, which is has_good_peers_in_custody_subnet(). We will try and discover peers constantly until we meet this requirement.

But the pruning logic is still to maintain uniform attestation subnets. We only now don't prune peers that might help with our custody subnet requirement.

I think now that things have changed, we should prioritize a uniform distribution on the data column custody and as a second priority, manage the attestation subnets. The reason being, is that the attestation subnets don't have a direct maintain_custody_peers() like function causing discoveries, and so I think its therefore less of a priority.

Also, for attestation subnets we really need, we have a min_ttl which prevents them from being pruned when we need them. So we can rely on that to save the crucial ones from being dropped.

So I think the pruning priorities should now be:

Maintain uniform distribution of data columns
a - Don't remove peers that we need for attestation subnets
b - Dont remove peers that we need for sync committees
If all of the above are satisfied, remove peers to make attestation subnets uniform.

jimmygchen · 2025-08-25T13:22:45Z

Thanks @AgeManning , yeah I think your suggestion makes sense, I'll make this change.

…peers

AgeManning

Left some comments

beacon_node/lighthouse_network/src/peer_manager/mod.rs

AgeManning · 2025-09-03T00:59:20Z

beacon_node/lighthouse_network/src/peer_manager/mod.rs

+    ///
+    /// This creates a unified structure containing all subnet information for each peer,
+    /// excluding trusted peers and peers already marked for pruning.
+    fn build_peer_subnet_info(


Maybe for a future PR. Rather than calculate this thing for every peer every heartbeat, we just change PeerInfo to store these naturally for each peer.

yeah good idea, with higher peer count it makes even more sense. I'll raise an issue for this.

beacon_node/lighthouse_network/src/peer_manager/mod.rs

beacon_node/lighthouse_network/src/discovery/mod.rs

Co-authored-by: Age Manning <[email protected]>

ackintosh

Thanks for this PR, Jimmy! I've left some comments.

beacon_node/lighthouse_network/src/peer_manager/mod.rs

AgeManning

This looks good to me :)

Co-authored-by: Akihito Nakano <[email protected]>

…ouse into maintain-custody-peers

I just noticed that one of the tests i added in #7915 is incorrect, after it was running flaky for a bit. This PR fixes the scenario and ensure the outcome will always be the same.

Closes: - sigp#7865 - sigp#7855 Changes extracted from earlier PR sigp#7876 This PR fixes two main things with a few other improvements mentioned below: - Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck - Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted. - Make peer discovery queries if custody subnet peer count drops below the minimum threshold - Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2) - Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets - Optimise some of the `PeerDB` functions checking custody peers - Only send lookup requests to peers that are synced or advanced

I just noticed that one of the tests i added in sigp#7915 is incorrect, after it was running flaky for a bit. This PR fixes the scenario and ensure the outcome will always be the same.

Closes: - sigp#7865 - sigp#7855 Changes extracted from earlier PR sigp#7876 This PR fixes two main things with a few other improvements mentioned below: - Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck - Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted. - Make peer discovery queries if custody subnet peer count drops below the minimum threshold - Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2) - Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets - Optimise some of the `PeerDB` functions checking custody peers - Only send lookup requests to peers that are synced or advanced

I just noticed that one of the tests i added in sigp#7915 is incorrect, after it was running flaky for a bit. This PR fixes the scenario and ensure the outcome will always be the same.

Maintain peers across all sampling subnets. Make discovery requests i…

05baf9c

…f custody subnet peers drop below the threshold. Optimise some peerdb functions.

jimmygchen requested a review from jxs as a code owner August 21, 2025 13:06

jimmygchen added ready-for-review The code is ready for review das Data Availability Sampling v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky labels Aug 21, 2025

jimmygchen mentioned this pull request Aug 21, 2025

Sync fixes for fusaka-devnet-4 #7876

Closed

5 tasks

Clean ups.

9e87e49

mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Aug 21, 2025

Prioritize unsynced peers for pruning

232e685

jimmygchen added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Aug 21, 2025

jimmygchen added 2 commits August 22, 2025 00:46

Remove brittle and unmaintainable test

edf8571

Fix function behaviour

6b1f2c8

jimmygchen mentioned this pull request Aug 21, 2025

Only send lookup requests to peers that are synced or advacned #7913

Closed

jimmygchen mentioned this pull request Aug 21, 2025

fusaka-devnet-3 test branch #7893

Closed

2 tasks

AgeManning reviewed Aug 25, 2025

View reviewed changes

jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Aug 25, 2025

mergify bot added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Aug 25, 2025

jimmygchen added work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels Aug 25, 2025

jimmygchen self-assigned this Aug 25, 2025

jimmygchen added 3 commits August 29, 2025 23:15

Merge remote-tracking branch 'origin/unstable' into maintain-custody-…

901208e

…peers

Update peer manager tests to use MetaDataV3

0a2fd8b

Prioritise data column subnet uniform distribution when pruning peers.

74466bd

jimmygchen requested review from AgeManning and ackintosh September 2, 2025 06:33

jimmygchen changed the title ~~Maintain peers across all sampling subnets~~ Maintain peers across all data columnj subnets Sep 2, 2025

Merge branch 'unstable' into maintain-custody-peers

73332c2

michaelsproul changed the title ~~Maintain peers across all data columnj subnets~~ Maintain peers across all data column subnets Sep 2, 2025

AgeManning reviewed Sep 3, 2025

View reviewed changes

jimmygchen and others added 2 commits September 3, 2025 14:02

Merge branch 'unstable' into maintain-custody-peers

63ffd1d

Update code comments from review

ac778c6

Co-authored-by: Age Manning <[email protected]>

ackintosh reviewed Sep 3, 2025

View reviewed changes

beacon_node/lighthouse_network/src/peer_manager/mod.rs Outdated Show resolved Hide resolved

beacon_node/lighthouse_network/src/peer_manager/mod.rs Outdated Show resolved Hide resolved

AgeManning approved these changes Sep 4, 2025

View reviewed changes

jimmygchen and others added 4 commits September 4, 2025 13:37

Update typo in comment

09d6bbe

Co-authored-by: Akihito Nakano <[email protected]>

Fix a pruning bug and update comments.

20483f9

Merge branch 'maintain-custody-peers' of github.com:jimmygchen/lighth…

de98f29

…ouse into maintain-custody-peers

Merge branch 'unstable' into maintain-custody-peers

ec88397

AgeManning approved these changes Sep 4, 2025

View reviewed changes

jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 4, 2025

mergify bot merged commit c2a92f1 into sigp:unstable Sep 4, 2025
37 checks passed

This was referenced Sep 4, 2025

Maintain peers across all custody column subnets #7865

Closed

Refactor data column RPC peer selection #7855

Closed

Allow AwaitingDownload to be a valid in-between state #7984

Merged

Fix incorrect prune test logic #7999

Merged

jimmygchen mentioned this pull request Sep 10, 2025

Network service update to maintain peers on all custody columns #6930

Closed

jimmygchen mentioned this pull request Sep 23, 2025

Attestation publishing failures due to insufficient peers on attestation subnets #8105

Closed

jimmygchen mentioned this pull request Oct 2, 2025

Could not publish attestation message due to NoPeersSubscribedToTopic #8153

Closed

Maintain peers across all data column subnets #7915

Maintain peers across all data column subnets #7915

Uh oh!

Conversation

jimmygchen commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Uh oh!

mergify bot commented Aug 21, 2025

Uh oh!

AgeManning left a comment

Choose a reason for hiding this comment

Uh oh!

jimmygchen commented Aug 25, 2025

Uh oh!

AgeManning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AgeManning Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jimmygchen Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ackintosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AgeManning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jimmygchen commented Aug 21, 2025 •

edited

Loading