core/rawdb, triedb/pathdb: implement history indexer #31156

rjl493456442 · 2025-02-11T11:54:33Z

This pull request is part-1 for shipping the core part of archive node over path mode.
These following things have been implemented:

state history index definition
state history indexer
state history reader

rjl493456442 · 2025-05-30T06:29:05Z

triedb/pathdb/history_index_block.go

+	min     uint64 // The minimum state ID retained within the block
+	max     uint64 // The maximum state ID retained within the block
+	entries uint32 // The number of state mutation records retained within the block
+	id      uint32 // The id of the index block


Can we get rid of the fields of id and min?

I think you can get easily rid of min and its also possible to not store id, I think. Since id is part of the key, we could just add that to the entry when its loaded and not store it in the db

I would keep the ID for a while, reasons:

It's technically possible to resolve the IDs from the database key by iterating the database. But

it's IO expensive

it's not robust, e.g., the stale index blocks will also be scanned

it's fairly easy to purge the existing indexes and regenerate them later

triedb/pathdb/history_index_block.go

MariusVanDerWijden · 2025-05-30T11:19:05Z

triedb/pathdb/history_indexer.go

+func (b *batchIndexer) process(h *history, historyID uint64) error {
+	for _, address := range h.accountList {
+		b.counter += 1
+		b.accounts[address] = append(b.accounts[address], historyID)


Just thinking out loud, the whole following code depends on the historyIDs to be sorted. So if we ever called process with an out of order historyID, this would not work anymore. Should we make sure before writing out the lists, that they are sorted?

The histories are resolved from the freezer in batch. I think the assumption is held that histories are processed in order.

MariusVanDerWijden · 2025-06-15T08:23:48Z

Are we loosing something if we use the account hash over the account address?

MariusVanDerWijden · 2025-06-18T09:39:03Z

core/rawdb/accessors_history.go

+		if err == nil {
+			return
+		}
+		if errors.Is(err, ethdb.ErrTooManyKeys) {


I don't understand this logic, if the range delete errors with tooManyKeys, we will continuously try to delete it. Wouldn't it always error with TooManyKeys then?

Nope, each call will make some progress by deleting items. Ultimately it will remove all the items in the range.

rjl493456442 · 2025-06-23T04:52:39Z

Totat storage size of a fully-sych'd archive node is around 1.9TB

gary@dev:~/hdd2$ du -sh geth-ancient-mainnet-archive/
1.5T    geth-ancient-mainnet-archive/
gary@dev:~/hdd2$ du -sh geth-ancient-mainnet-archive/ancient/chain/
921G    geth-ancient-mainnet-archive/ancient/chain/
gary@dev:~/hdd2$ du -sh geth-ancient-mainnet-archive/ancient/state
539G    geth-ancient-mainnet-archive/ancient/state

gary@dev:~$ du -sh mount/geth/geth
413G    mount/geth/geth

+-----------------------+-----------------------------+------------+------------+
|       DATABASE        |          CATEGORY           |    SIZE    |   ITEMS    |
+-----------------------+-----------------------------+------------+------------+
| Key-Value store       | Headers                     | 2.37 MiB   |       3640 |
| Key-Value store       | Bodies                      | 333.02 MiB |       3640 |
| Key-Value store       | Receipt lists               | 316.00 MiB |       3639 |
| Key-Value store       | Difficulties (deprecated)   | 0.00 B     |          0 |
| Key-Value store       | Block number->hash          | 149.26 KiB |       3639 |
| Key-Value store       | Block hash->number          | 888.96 MiB |   22735156 |
| Key-Value store       | Transaction index           | 13.84 GiB  |  401645299 |
| Key-Value store       | Log index filter-map rows   | 0.00 B     |          0 |
| Key-Value store       | Log index last-block-of-map | 0.00 B     |          0 |
| Key-Value store       | Log index block-lv          | 0.00 B     |          0 |
| Key-Value store       | Log bloombits (deprecated)  | 0.00 B     |          0 |
| Key-Value store       | Contract codes              | 10.19 GiB  |    1699192 |
| Key-Value store       | Hash trie nodes             | 0.00 B     |          0 |
| Key-Value store       | Path trie state lookups     | 888.86 MiB |   22732693 |
| Key-Value store       | Path trie account nodes     | 47.26 GiB  |  409852077 |
| Key-Value store       | Path trie storage nodes     | 179.98 GiB | 1791011965 |
| Key-Value store       | Path state history indexes  | 290.62 GiB | 4112383456 |
| Key-Value store       | Verkle trie nodes           | 0.00 B     |          0 |
| Key-Value store       | Verkle trie state lookups   | 0.00 B     |          0 |
| Key-Value store       | Trie preimages              | 2.07 MiB   |      31025 |
| Key-Value store       | Account snapshot            | 13.75 GiB  |  299285136 |
| Key-Value store       | Storage snapshot            | 95.42 GiB  | 1322797351 |
| Key-Value store       | Beacon sync headers         | 18.25 MiB  |      29397 |
| Key-Value store       | Clique snapshots            | 0.00 B     |          0 |
| Key-Value store       | Singleton metadata          | 202.94 MiB |         15 |
| Ancient store (Chain) | Headers                     | 10.92 GiB  |   22731517 |
| Ancient store (Chain) | Hashes                      | 823.78 MiB |   22731517 |
| Ancient store (Chain) | Bodies                      | 655.40 GiB |   22731517 |
| Ancient store (Chain) | Receipts                    | 253.37 GiB |   22731517 |
| Ancient store (State) | Storage.Index               | 203.80 GiB |   22732692 |
| Ancient store (State) | Account.Data                | 141.52 GiB |   22732692 |
| Ancient store (State) | Storage.Data                | 50.38 GiB  |   22732692 |
| Ancient store (State) | History.Meta                | 1.67 GiB   |   22732692 |
| Ancient store (State) | Account.Index               | 141.45 GiB |   22732692 |
+-----------------------+-----------------------------+------------+------------+
|                                    TOTAL            |  2.06 TIB  |            |
+-----------------------+-----------------------------+------------+------------+

core/rawdb/schema.go

This pull request is part-1 for shipping the core part of archive node in PBSS mode.

rjl493456442 requested a review from holiman as a code owner February 11, 2025 11:54

rjl493456442 force-pushed the pbss-archive-p1 branch 4 times, most recently from b661a71 to 53691ee Compare February 13, 2025 05:56

rjl493456442 added post-prague pbss-archive labels Feb 27, 2025

rjl493456442 force-pushed the pbss-archive-p1 branch 2 times, most recently from e4ac9f4 to a12c5a2 Compare May 28, 2025 12:01

rjl493456442 commented May 30, 2025

View reviewed changes

triedb/pathdb/history_index_block.go Outdated Show resolved Hide resolved

rjl493456442 commented May 30, 2025

View reviewed changes

triedb/pathdb/history_index_block.go Show resolved Hide resolved

rjl493456442 commented May 30, 2025

View reviewed changes

triedb/pathdb/history_index_block.go Show resolved Hide resolved

MariusVanDerWijden reviewed May 30, 2025

View reviewed changes

MariusVanDerWijden reviewed Jun 18, 2025

View reviewed changes

rjl493456442 force-pushed the pbss-archive-p1 branch from 9b9f23d to 3ee9879 Compare June 20, 2025 12:00

rjl493456442 added 13 commits June 22, 2025 20:42

core/rawdb, triedb/pathdb: implement history indexer

ca0a0fe

triedb/pathdb: only enable history indexing in archive mode

d326be0

core/rawdb: track the metadata key

f7ae5b6

triedb/pathdb: update comment

c428df2

cmd/utils: remove the archive restriction on path

3036336

core: enable history indexing in archive mode

a9d42f3

triedb/pathdb: add logs

436edce

triedb/pathdb: fix tests

e0cc4df

triedb/pathdb: use uint16 for entries field

27fbfdc

triedb/pathdb: introduce metadata and version tracking

69fa068

triedb/pathdb: use uint16 for restart and uint8 for restartLen

03d507d

triedb/pathdb: get rid of min field

297cebd

triedb/pathdb: improve error handling

6f29a1e

rjl493456442 added 7 commits June 22, 2025 20:44

triedb/pathdb: improve legacy data handling

dfff224

core/rawdb: fix range deletion

bd9ead8

triedb/pathdb: bump batch size

75e40f9

ethdb/pebble: track iterator number

1370ed7

core/rawdb, triedb/pathdb: use account hash as the database key

aa65b94

triedb/pathdb: flush mutation records in parallel

fdac3b7

core/rawdb: improve range deletion

14561c6

rjl493456442 force-pushed the pbss-archive-p1 branch from 3ee9879 to 14561c6 Compare June 22, 2025 12:48

core/rawdb: try new scheme

9b09894

MariusVanDerWijden reviewed Jun 24, 2025

View reviewed changes

core/rawdb/schema.go Outdated Show resolved Hide resolved

MariusVanDerWijden reviewed Jun 24, 2025

View reviewed changes

core/rawdb/schema.go Outdated Show resolved Hide resolved

core/rawdb: address comments

8ec14d7

rjl493456442 added the status:triage label Jun 24, 2025

fjl added this to the 1.15.12 milestone Jun 24, 2025

fjl approved these changes Jun 24, 2025

View reviewed changes

fjl merged commit 9c5c0e3 into ethereum:master Jun 24, 2025
3 of 4 checks passed

BrewTestBot mentioned this pull request Jun 26, 2025

ethereum 1.16.0 Homebrew/homebrew-core#228279

Closed

rjl493456442 added a commit to rjl493456442/go-ethereum that referenced this pull request Jul 19, 2025

core/rawdb, triedb/pathdb: implement history indexer (ethereum#31156)

e28b6d8

This pull request is part-1 for shipping the core part of archive node in PBSS mode.

allformless mentioned this pull request Aug 6, 2025

upstream: merge geth-v1.16.1 bnb-chain/bsc#3261

Merged

howjmay pushed a commit to iotaledger/go-ethereum that referenced this pull request Aug 27, 2025

core/rawdb, triedb/pathdb: implement history indexer (ethereum#31156)

1678057

This pull request is part-1 for shipping the core part of archive node in PBSS mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core/rawdb, triedb/pathdb: implement history indexer #31156

core/rawdb, triedb/pathdb: implement history indexer #31156

Uh oh!

rjl493456442 commented Feb 11, 2025 •

edited

Loading

Uh oh!

rjl493456442 May 30, 2025

Uh oh!

MariusVanDerWijden May 30, 2025

Uh oh!

rjl493456442 Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MariusVanDerWijden May 30, 2025

Uh oh!

rjl493456442 May 30, 2025

Uh oh!

MariusVanDerWijden commented Jun 15, 2025

Uh oh!

MariusVanDerWijden Jun 18, 2025

Uh oh!

rjl493456442 Jun 18, 2025

Uh oh!

rjl493456442 commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

core/rawdb, triedb/pathdb: implement history indexer #31156

core/rawdb, triedb/pathdb: implement history indexer #31156

Uh oh!

Conversation

rjl493456442 commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjl493456442 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

MariusVanDerWijden May 30, 2025

Choose a reason for hiding this comment

Uh oh!

rjl493456442 Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MariusVanDerWijden May 30, 2025

Choose a reason for hiding this comment

Uh oh!

rjl493456442 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

MariusVanDerWijden commented Jun 15, 2025

Uh oh!

MariusVanDerWijden Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

rjl493456442 Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

rjl493456442 commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rjl493456442 commented Feb 11, 2025 •

edited

Loading