Skip to content

Conversation

@AgeManning
Copy link
Member

Motivation

It is becoming clear that Fuska will benefit from a more connected network.

The requirement to download columns and the large number of columns require us to maintain quite a diverse peer set. We also need a diverse peer set for publishing attestations. Optimizing our peer-set becomes difficult when we are constrained by the maximum number of peers we can maintain connections with.

Lighthouse (and the overall Ethereum network) gains stability when there are more connections between peers allowing us to discover and maintain a more diverse peer set to satisfy the custody and subnet requirements.

I'm therefore proposing to doubling Lighthouse's default peer count to 200.

Justification

The downside of increasing a node's peer-count is simply resource consumption. More connected peers, means that a node needs to support more RPC requests and handle a great gossipsub load. This results in bandwidth, memory and CPU increases. However the increase is not linear as one might expect. Gossipsub bounds the number of peers that consume the most bandwidth and increasing out peer count doesn't increase the number of these peers. These are controlled by the mesh parameters.

Therefore increasing out peer count won't linearly increase our resource consumption. A greater increase in peer count should (theoretically) result in a minimal increase in bandwidth and subsequent cpu/memory use. As the stability and performance in a Fusaka network is significantly more important to us than an increase in resource use, I think it's justified for us to make this trade-off.

Analysis

I'll add more analysis as we go. But preliminary results look like:

100 Peers (Current default)

For our default 100 peer target (running over an hour on mainnet and calculating averages):

  Average peers: 91
  LibP2P Bandwidth:
    Inbound:  374.17 KB/s
    Outbound: 269.33 KB/s
  Discovery Bandwidth:
    Inbound:  9.40 KB/s
    Outbound: 1.91 KB/s
  Total Bandwidth:
    Inbound:  383.58 KB/s
    Outbound: 271.24 KB/s
  Average memory usage: 4950.8 MB

This PR 200 Peers (slight gossip modifications)

  Average peers: 183
  LibP2P Bandwidth:
    Inbound:  383.30 KB/s
    Outbound: 284.66 KB/s
  Discovery Bandwidth:
    Inbound:  10.35 KB/s
    Outbound: 2.26 KB/s
  Total Bandwidth:
    Inbound:  393.65 KB/s
    Outbound: 286.92 KB/s
 Average memory usage: 5339.9 MB

Comparison

  Peer count change: +92 (+101.3%)
  LibP2P Inbound change: 9.13 KB/s (+2.4%)
  LibP2P Outbound change: 15.33 KB/s (+5.7%)
  Total Inbound change: 10.07 KB/s (+2.6%)
  Total Outbound change: 15.68 KB/s (+5.8%)
Screenshot_select-area_20250814175926

Preliminary Results

Preliminary results show marginal cost to bandwidth and about a 10% increase in memory usage. Although this looks fairly promising I'll collect more data for a more informed decision. For the gains in stability, I think this looks promising and worth the extra resource consumption.

Also, users can always lower their peer counts via --target-peers if the resource usage is getting too high for them.

@AgeManning AgeManning requested a review from jxs as a code owner August 14, 2025 08:00
@jimmygchen jimmygchen added fulu Required for the upcoming Fulu hard fork v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky labels Aug 15, 2025
@jimmygchen jimmygchen mentioned this pull request Aug 15, 2025
5 tasks
@AgeManning
Copy link
Member Author

AgeManning commented Aug 17, 2025

I've done some further tests. These are the results:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Peers  Total  TCP  QUIC  Dur(h) CPU%   Mem(Mb)  Total(MB/s) TCP(MB/s)  QUIC(MB/s)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
100    103    70   33    1.1    79.44  4997     0.34        0.25       0.10
150    154    99   55    1.1    83.83  5190     0.31        0.26       0.05
200    205    128  77    1.1    83.69  5142     0.29        0.23       0.05
300    304    206  98    1.1    89.28  5308     0.37        0.31       0.06
400    405    256  149   1.1    95.25  5541     0.44        0.39       0.05
500    505    337  167   1.1    98.40  5579     0.39        0.36       0.03
750    752    478  274   1.1    100.58 6333     0.55        0.54       0.02
1000   998    656  340   1.1    135.62 6833     2.59        2.50       0.08
------------------------------------------------------------------------------------------------------------------------

The Total column represents the average peer count over the duration of the run. The TCP and QUIC are the average peers using TCP and QUIC over the duration of the run. We have split the total bandwidth up by transport in the TCP and QUIC fields at the end.

The outlier is 1000 peers and the reported bandwidth sky-rocketed. This could be due to peers syncing from us.

In graph form:
Bandwidth:
Bandwidth per Peer Count
Memory:
Memory per Peer Count
CPU:
CPU Usage per Peer Count

It is someone probabilistic that with more peers we are more likely to get nodes syncing and requesting more data, so bandwidth can grow more than these results (which ran for about an hour each on mainnet).

In either case. I think its fine to increase our peer count to 200 and potentially up to the 500 mark.

This should resolve #4962 and #6920

@AgeManning
Copy link
Member Author

Overall I propose an initial increase to 200. Once we see how the network handles this, we consider an upgrade to 300 or 400 in a future release.

jimmygchen added a commit that referenced this pull request Aug 18, 2025
Squashed commit of the following:

commit 60ea5c6
Author: Age Manning <[email protected]>
Date:   Mon Aug 18 10:31:31 2025 +1000

    Fmt

commit 3f7f6fc
Author: Age Manning <[email protected]>
Date:   Thu Aug 14 15:39:23 2025 +1000

    Double lighthouse's peer count for Fusaka
@jimmygchen jimmygchen mentioned this pull request Aug 18, 2025
2 tasks
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jimmygchen jimmygchen added the ready-for-merge This PR is ready to merge. label Aug 21, 2025
@mergify
Copy link

mergify bot commented Aug 21, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@jimmygchen jimmygchen removed the ready-for-merge This PR is ready to merge. label Aug 21, 2025
@jimmygchen jimmygchen added the ready-for-merge This PR is ready to merge. label Aug 21, 2025
@mergify mergify bot merged commit c9ffdf7 into sigp:unstable Aug 21, 2025
34 checks passed
mergify bot pushed a commit that referenced this pull request Aug 22, 2025
Was going to leave this as a comment on #7877 but when noticed it had already been merged.
we have `DEFAULT_TARGET_PEERS` which was set to 50 and only used on the `Default` impl for `peer_manager`'s `Config`, which then get's overridden by this `lighthouse_network::Config`s default
This PR unifies everything on `DEFAULT_TARGET_PEERS`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fulu Required for the upcoming Fulu hard fork ready-for-merge This PR is ready to merge. v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants