-
Notifications
You must be signed in to change notification settings - Fork 952
Fix bugs in rebasing of states prior to finalization #7849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jimmygchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! the fix makes sense to me.
|
Will merge once we get a +1 from beaconcha.in, but I'm fairly sure this is good. |
|
|
||
| let current_cache_is_incomplete = pubkey_cache.len() < num_validators; | ||
| let base_cache_is_compatible = base_pubkey_cache.len() <= num_validators; | ||
| let base_cache_is_superior = base_pubkey_cache.len() > pubkey_cache.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the base cache is not superior is it worth it to use it to save memory? i.e. if it's missing just 1 pubkey it's okay to also re-use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beaconcha.in have tested this PR and confirmed it fixes the issue, so gonna merge as-is. We can come back for this optimisation later if we like (it is orthogonal to fixing the bug).
Issue Addressed
Attempt to fix this error reported by
beaconcha.inon their Hoodi archive nodes:Proposed Changes
There are only a handful of places where we call
iter_from.This one is safe by construction (the check immediately prior ensures
self.pubkeys.len()is not out of bounds):lighthouse/beacon_node/beacon_chain/src/validator_pubkey_cache.rs
Lines 84 to 90 in cfb1f73
This one should also be safe, and the indexes used here would not be as large as the ones in the reported error:
lighthouse/consensus/state_processing/src/per_epoch_processing/single_pass.rs
Lines 365 to 368 in cfb1f73
Which leaves one remaining usage which must be the culprit:
lighthouse/consensus/types/src/beacon_state.rs
Lines 2109 to 2113 in cfb1f73
This indexing relies on the invariant that
self.pubkey_cache().len() <= self.validators.len(). We mostly maintain that invariant, except for inrebase_caches_on(fixed in this PR).The other bug, is that we were calling
rebase_on_finalizedfor all "hot" states, which post-v7.1.0 includes states prior to the split which are required by the hdiff grid. This is how we end up calling something likegenesis_state.rebase_on(&split_state), which then corrupts the pubkey cache of the genesis state using the newer pubkey cache from the split state.