Skip to content

Conversation

@michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Sep 22, 2025

Issue Addressed

As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead.

Proposed Changes

  • Add "low level" checks to computation functions in consensus/types to ensure they error cleanly
  • Re-work the determination of proposer shuffling decision roots, which are now fork aware.
  • Re-work and simplify the beacon proposer cache to be fork-aware.
  • Optimise with_proposer_cache to use OnceCell.
  • All tests passing.
  • Resolve all remaining FIXME(sproul)s.
  • Unit tests for ProtoBlock::proposer_shuffling_root_for_child_block.
  • End-to-end regression test.
  • Test on pre-Fulu network.
  • Test on post-Fulu network.

Testing

  • The correctness of ProtoBlock::proposer_shuffling_root_for_child_block is tested by new tests added in e471315. They catch the bug that was fixed by 83128c4.

@michaelsproul michaelsproul added bug Something isn't working work-in-progress PR is a work-in-progress security fulu Required for the upcoming Fulu hard fork v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky labels Sep 22, 2025
@dapplion
Copy link
Collaborator

I looked at this with @michaelsproul today and the approach makes sense. I need a few more passes on my own to fully get the changes.

@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Sep 25, 2025
jimmygchen added a commit that referenced this pull request Sep 25, 2025
Squashed commit of the following:

commit 9b4e4eb
Author: Michael Sproul <[email protected]>
Date:   Thu Sep 25 13:16:44 2025 +1000

    Use correct decision block root in missed block test

commit 6da576c
Author: Michael Sproul <[email protected]>
Date:   Thu Sep 25 12:57:33 2025 +1000

    Oops

commit 83128c4
Author: Michael Sproul <[email protected]>
Date:   Thu Sep 25 12:56:03 2025 +1000

    Document and refine proposer_shuffling_root_for_child_block

commit a08fd93
Author: Michael Sproul <[email protected]>
Date:   Thu Sep 25 12:55:53 2025 +1000

    Remove FIXME in favour of comment

commit 353fdcb
Author: Michael Sproul <[email protected]>
Date:   Thu Sep 25 11:56:20 2025 +1000

    Use min_seed_lookahead in ChainSpec::proposer_shuffling_decision_slot

commit 4ceccd2
Author: Michael Sproul <[email protected]>
Date:   Thu Sep 25 11:23:41 2025 +1000

    Re-implement OnceCell optimisation

commit 0ed80ff
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 24 19:02:33 2025 +1000

    Remove resolved FIXME

commit 645c869
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 24 18:57:53 2025 +1000

    Simplify blob verification

commit 0917bfb
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 24 18:51:37 2025 +1000

    More bug fixes and simplifications

commit d41c236
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 24 14:09:44 2025 +1000

    Use new proposer cache method in block verif

commit b34af93
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 24 12:43:09 2025 +1000

    Start using with_proposer_cache

commit dd799d8
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 24 12:18:57 2025 +1000

    Laying the groundwork for safer proposer index calcs

commit b1c5308
Author: Michael Sproul <[email protected]>
Date:   Mon Sep 22 17:54:02 2025 +1000

    Add sanity checks in BeaconState::compute_proposer_indices
Comment on lines +1031 to +1035
// We do not run this check if this function is called from `upgrade_to_fulu`,
// which runs *after* the slot is incremented, and needs to compute the proposer
// shuffling for the epoch that was just transitioned into.
if self.fork_name_unchecked().fulu_enabled()
&& epoch < current_epoch.safe_add(spec.min_seed_lookahead)?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtraglia This is the exception to the assert I mentioned, which allows us to compute the proposer indices with 0 lookahead only in the case of upgrade_to_fulu (when the state itself has not yet become the Fulu variant).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this makes sense. I think these checks (this one & the one mentioned in the comment below) might be a little too complex to add to the specs. I still prefer just removing these two helper functions sometime in the future. I'll try to implement these in the specs locally just to get a feel for what it would look like though.

Comment on lines +1221 to +1238
// This isn't in the spec, but we remove the footgun that is requesting the current epoch
// for a Fulu state.
if let Ok(proposer_lookahead) = self.proposer_lookahead()
&& epoch >= self.current_epoch()
&& epoch <= self.next_epoch()?
{
let slots_per_epoch = E::slots_per_epoch() as usize;
let start_offset = if epoch == self.current_epoch() {
0
} else {
slots_per_epoch
};
return Ok(proposer_lookahead
.iter_from(start_offset)?
.take(slots_per_epoch)
.map(|x| *x as usize)
.collect());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtraglia Similarly, this block only comes into effect outside of the upgrade_to_fulu logic.

let epoch_cache: &EpochCache = state.epoch_cache();
let decision_block_root = state
.proposer_shuffling_decision_root(Hash256::zero())
.epoch_cache_decision_root(Hash256::zero())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epoch cache is keyed on the last block of the previous epoch. Prior to Fulu this was the same as the proposer shuffling decision root, however that is no longer the case. So we use a separate function.

///
/// This function assumes that `child_block_epoch >= self.epoch`. It is the responsibility of
/// the caller to check this condition, or else incorrect results will be produced.
pub fn proposer_shuffling_root_for_child_block(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit subtle. It was extracted from the 2-3 places where we used it previously (block/blob/data column verification).

It relates the proposer shuffling to the attester shufflings. The relationship is slightly different depending on whether Fulu is enabled. Prior to Fulu everything is shifted by an epoch, and then post Fulu things line up (proposer shuffling and attester shuffling for epoch N have the same decision block root).

Copy link
Member

@jimmygchen jimmygchen Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice refactoring, much better keeping this kind of logic in one place

beacon_proposer_cache.lock().insert(
epoch,
duplicate_block_root,
decision_block_root,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this test assumed that the duplicate_block_root was the decision root, but that is no longer true post Fulu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the case for this one too?

.proposer_shuffling_decision_root(duplicate_block_root, &harness1.chain.spec)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is wrong, yeah, but doesn't actually do anything, because the block_root argument to proposer_shuffling_decision_root isn't used unless we're right on an epoch boundary (pre-Fulu), or there's a whole epoch of skipped slots (post Fulu).

I think I would prefer to completely rewrite these tests than tweaking them any more. There is an open issue for this:

let dependent_root = new_snapshot
.beacon_state
.proposer_shuffling_decision_root(self.genesis_block_root);
.attester_shuffling_decision_root(self.genesis_block_root, RelativeEpoch::Next);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These used to be equivalent, but are no longer. This dependent root is meant to be the attestation shuffling for the next epoch, so it makes a lot more sense to use that explicitly!

parent_block.proposer_shuffling_root_for_child_block(block_epoch, &chain.spec);

// We assign to a variable instead of using `if let Some` directly to ensure we drop the
// write lock before trying to acquire it again in the `else` clause.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment above is no longer relevant here, now we've moved the lock management into with_proposer_cache

///
/// - ProtoBlock::proposer_shuffling_root_for_child_block, and
/// - BeaconState::proposer_shuffling_decision_root{_at_epoch}
async fn proposer_shuffling_root_consistency_test(parent_slot: u64, child_slot: u64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little strange that these tests are under store_tests.rs, i guess it's more for convenience?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just write everything under store tests haha

Ok(())
);

let decision_block_root = _state2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be renamed to _state now it's being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests do this a lot already, I hate them 🤣

Again, I reckon we just:

@jimmygchen
Copy link
Member

jimmygchen commented Sep 26, 2025

We have this deployed to both live testnets and fusaka-devnet-3 for a day now and I have not noticed any issues. I'm going to tick the testing boxes. But I'm going to keep trying to reproduce the attack on local devnets.

@michaelsproul
Copy link
Member Author

michaelsproul commented Sep 26, 2025

I think the attack on a live network might require a proposer cache miss. So potentially multiple skipped slots near an epoch boundary, plus the effective balance changes

}
}

pub fn with_proposer_cache<V, E: From<BeaconChainError> + From<BeaconStateError>>(
Copy link
Member

@jimmygchen jimmygchen Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces the caller to implement From for both error types, it feels a bit strange but given this works with existing code, i think it's fine to leave it as it is.

let proposer_shuffling_root =
parent_block.proposer_shuffling_root_for_child_block(column_epoch, &chain.spec);

let proposer = chain.with_proposer_cache(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, we also avoid building committee caches unnecessarily with cheap_state_advance_to_obtain_committees here and use ensure_state_can_determine_proposers_for_epoch instead

/// - It must be the case that `state.canonical_root() == state_root`, but this function will not
/// check that.
pub fn ensure_state_is_in_epoch<E: EthSpec>(
pub fn ensure_state_can_determine_proposers_for_epoch<E: EthSpec>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth adding instrumenting tracing for this function (we used to call cheap_state_advance_to_obtain_committees which shows up in the gossip processing traces)

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a difficult PR to review, as it covers quite a lot of code paths and edge cases, but after a few passes, i think it looks very solid! Impressed with how thorough you are with this one. I think you've handled all the code paths I can find and can't really spot any issues with correctness.

I think this is good to merge and we can address the comments next week.

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 26, 2025
@mergify mergify bot added the queued label Sep 26, 2025
mergify bot added a commit that referenced this pull request Sep 26, 2025

#[tokio::test]
async fn proposer_shuffling_root_consistency_same_epoch() {
proposer_shuffling_root_consistency_test(32, 39).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do in another PR, but writing the numbers like will help understand them

Suggested change
proposer_shuffling_root_consistency_test(32, 39).await;
proposer_shuffling_root_consistency_test(
E::slots_per_epoch().as_u64() * 4,
E::slots_per_epoch().as_u64() * 5 - 1,
).await;

@mergify mergify bot merged commit c754234 into sigp:unstable Sep 26, 2025
37 checks passed
@mergify mergify bot removed the queued label Sep 26, 2025
}

#[tokio::test]
async fn proposer_shuffling_changing_with_lookahead() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test 💯

index: pre_deposit_state.eth1_deposit_index(),
pubkey: validator_to_topup.pubkey,
withdrawal_credentials: validator_to_topup.withdrawal_credentials,
amount: 63_000_000_000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you chose this value to find a balance that will trigger a change in the proposer selection

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The churn limit is 64 ETH, so this is just the largest value that fits under that threshold and gets applied quickly. It also happens to change the shuffling with the default seed, and I think should be likely to change it with such a small # of validators

mergify bot pushed a commit that referenced this pull request Sep 28, 2025
N/A


  In #8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right `Fork` for signature verification.
This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork
```
Sep 26 14:24:00.088 DEBUG Rejected gossip block                         error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640
Sep 26 14:24:00.099 WARN  Could not verify block for gossip. Rejecting the block  error: InvalidSignature(ProposerSignature)
```

I'm not completely sure this is the correct fix, but this fixes the issue with `InvalidProposerSignature` on the holesky shadow fork.

Thanks to @eserilev for helping debug this


Co-Authored-By: Pawan Dhananjay <[email protected]>
@michaelsproul michaelsproul deleted the fulu-proposer-validation branch September 28, 2025 23:33
mergify bot pushed a commit that referenced this pull request Sep 29, 2025
Follow-up to the bug fixed in:

- #8121

This fixes the root cause of that bug, which was introduced by me in:

- #8101

Lion identified the issue here:

- #8101 (comment)


  In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`.

I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised).


Co-Authored-By: Michael Sproul <[email protected]>
mergify bot pushed a commit that referenced this pull request Oct 20, 2025
Addressing more review comments from:

- #8101

I've also tweaked a few more things that I think are minor bugs.


  - Instrument `ensure_state_can_determine_proposers_for_epoch`
- Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 .
- Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu.


Co-Authored-By: Michael Sproul <[email protected]>
jchavarri pushed a commit to jchavarri/lighthouse that referenced this pull request Oct 21, 2025
Addressing more review comments from:

- sigp#8101

I've also tweaked a few more things that I think are minor bugs.


  - Instrument `ensure_state_can_determine_proposers_for_epoch`
- Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 .
- Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu.


Co-Authored-By: Michael Sproul <[email protected]>
lmnzx pushed a commit to lmnzx/lighthouse that referenced this pull request Nov 4, 2025
As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead.


  - [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly
- [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware.
- [x] Re-work and simplify the beacon proposer cache to be fork-aware.
- [x] Optimise `with_proposer_cache` to use `OnceCell`.
- [x] All tests passing.
- [x] Resolve all remaining `FIXME(sproul)`s.
- [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`.
- [x] End-to-end regression test.
- [x] Test on pre-Fulu network.
- [x] Test on post-Fulu network.


Co-Authored-By: Michael Sproul <[email protected]>
lmnzx pushed a commit to lmnzx/lighthouse that referenced this pull request Nov 4, 2025
N/A


  In sigp#8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right `Fork` for signature verification.
This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork
```
Sep 26 14:24:00.088 DEBUG Rejected gossip block                         error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640
Sep 26 14:24:00.099 WARN  Could not verify block for gossip. Rejecting the block  error: InvalidSignature(ProposerSignature)
```

I'm not completely sure this is the correct fix, but this fixes the issue with `InvalidProposerSignature` on the holesky shadow fork.

Thanks to @eserilev for helping debug this


Co-Authored-By: Pawan Dhananjay <[email protected]>
lmnzx pushed a commit to lmnzx/lighthouse that referenced this pull request Nov 4, 2025
Follow-up to the bug fixed in:

- sigp#8121

This fixes the root cause of that bug, which was introduced by me in:

- sigp#8101

Lion identified the issue here:

- sigp#8101 (comment)


  In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`.

I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised).


Co-Authored-By: Michael Sproul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fulu Required for the upcoming Fulu hard fork ready-for-merge This PR is ready to merge. security v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants