Skip to content

Conversation

@michaelsproul
Copy link
Member

Issue Addressed

Attempt to fix this error reported by beaconcha.in on their Hoodi archive nodes:

{"code":500,"message":"UNHANDLED_ERROR: DBError(CacheBuildError(BeaconState(MilhouseError(OutOfBoundsIterFrom { index: 1199549, len: 1060000 }))))","stacktraces":[]}

Proposed Changes

There are only a handful of places where we call iter_from.

This one is safe by construction (the check immediately prior ensures self.pubkeys.len() is not out of bounds):

if state.validators().len() > self.pubkeys.len() {
self.import(
state
.validators()
.iter_from(self.pubkeys.len())?
.map(|v| v.pubkey),
)

This one should also be safe, and the indexes used here would not be as large as the ones in the reported error:

state
.pending_deposits()?
.iter_from(ctxt.next_deposit_index)?
.cloned(),

Which leaves one remaining usage which must be the culprit:

pub fn update_pubkey_cache(&mut self) -> Result<(), Error> {
let mut pubkey_cache = mem::take(self.pubkey_cache_mut());
let start_index = pubkey_cache.len();
for (i, validator) in self.validators().iter_from(start_index)?.enumerate() {

This indexing relies on the invariant that self.pubkey_cache().len() <= self.validators.len(). We mostly maintain that invariant, except for in rebase_caches_on (fixed in this PR).

The other bug, is that we were calling rebase_on_finalized for all "hot" states, which post-v7.1.0 includes states prior to the split which are required by the hdiff grid. This is how we end up calling something like genesis_state.rebase_on(&split_state), which then corrupts the pubkey cache of the genesis state using the newer pubkey cache from the split state.

@michaelsproul michaelsproul added bug Something isn't working database HTTP-API v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky labels Aug 11, 2025
@michaelsproul michaelsproul requested a review from dapplion August 11, 2025 01:56
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! the fix makes sense to me.

@michaelsproul
Copy link
Member Author

Will merge once we get a +1 from beaconcha.in, but I'm fairly sure this is good.


let current_cache_is_incomplete = pubkey_cache.len() < num_validators;
let base_cache_is_compatible = base_pubkey_cache.len() <= num_validators;
let base_cache_is_superior = base_pubkey_cache.len() > pubkey_cache.len();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the base cache is not superior is it worth it to use it to save memory? i.e. if it's missing just 1 pubkey it's okay to also re-use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah maybe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beaconcha.in have tested this PR and confirmed it fixes the issue, so gonna merge as-is. We can come back for this optimisation later if we like (it is orthogonal to fixing the bug).

@michaelsproul michaelsproul added the ready-for-merge This PR is ready to merge. label Aug 12, 2025
mergify bot added a commit that referenced this pull request Aug 12, 2025
@mergify mergify bot merged commit 918121e into unstable Aug 12, 2025
34 checks passed
@mergify mergify bot deleted the rebase-cache-fix branch August 12, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working database HTTP-API ready-for-merge This PR is ready to merge. v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants