Skip to content

fix: only assign aggregator role to node 1 in simtest#642

Merged
anshalshukla merged 1 commit intomainfrom
fix/641-single-aggregator-simtest
Mar 8, 2026
Merged

fix: only assign aggregator role to node 1 in simtest#642
anshalshukla merged 1 commit intomainfrom
fix/641-single-aggregator-simtest

Conversation

@zclawz
Copy link
Copy Markdown
Contributor

@zclawz zclawz commented Mar 8, 2026

Summary

Fixes #641

When running zig build simtest, all three nodes were initialized with .is_aggregator = beamcmd.@"is-aggregator", causing all of them to act as aggregators when --is-aggregator true is passed. Per spec, only one aggregator should exist per subnet.

Fix

Hardcode is_aggregator = false for nodes 2 and 3. Only node 1 (the primary node) inherits the --is-aggregator CLI flag.

Changes

  • pkgs/cli/src/main.zig: beam_node_2 and beam_node_3 now always initialize with is_aggregator = false

When running `zig build simtest`, all three nodes were initialized with
`.is_aggregator = beamcmd.@"is-aggregator"`, causing all of them to act
as aggregators when the --is-aggregator flag is set. Per the spec, only
one aggregator should exist per subnet.

Fix: hardcode `is_aggregator = false` for nodes 2 and 3. Only node 1
(the primary node) inherits the CLI flag.

Fixes #641
@anshalshukla anshalshukla merged commit 5cf21d4 into main Mar 8, 2026
12 checks passed
@anshalshukla anshalshukla deleted the fix/641-single-aggregator-simtest branch March 8, 2026 18:01
zclawz pushed a commit that referenced this pull request Mar 9, 2026
The mock network's publish() and onGossip() were calling
gossipHandler.onGossip(..., scheduleOnLoop=true), which defers gossip
delivery via a 1ms event-loop timer. This caused a race condition in
the simtest: with a single aggregator (node 1), attestations from nodes
2 and 3 were scheduled for async delivery but not yet in node 1's
gossip_signatures when it aggregated at interval 2. Result: only 1/3
validator participation, enough for justification but not finalization.

The scheduleOnLoop flag exists for the real libp2p network (where
handlers can't be called synchronously from the network thread). In the
mock, all nodes share the same event loop and gossip should be delivered
synchronously to correctly simulate instant peer propagation.

Fix: pass scheduleOnLoop=false in mock.zig's publish() and onGossip().

Fixes flaky simtest introduced after #642.
zclawz pushed a commit that referenced this pull request Mar 10, 2026
Root cause of finalization deadlock with 2 active validators:

1. Block processing calls onAttestation(is_from_block=true), which
   clears latestNew for every validator whose attestation is in the block.
2. updateSafeTargetUnlocked uses computeFCHeadUnlocked(from_known=false)
   which reads only latestNew.
3. After a block is processed at interval 0, the proposer skips gossip
   attestation at interval 1 — so only 1 validator fills latestNew.
4. At interval 3, cutoff=2 fails (only 1 validator in latestNew) and
   safeTarget resets back to genesis.
5. With safeTarget=0, getAttestationTarget always returns a slot ≤
   latest_justified → attestation is skipped → finalization deadlocks.

Fix: safeTarget is monotonically non-decreasing. Once 2/3 of validators
confirmed a block, that signal is preserved even when latestNew is
subsequently cleared by block inclusion.

Fixes simtest finalization with single aggregator (post #642).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make 3 nodes simulate test to just use one aggregator

3 participants