-
Notifications
You must be signed in to change notification settings - Fork 769
Block Database #4027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DracoLi
wants to merge
28
commits into
master
Choose a base branch
from
dl/blockdb
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Block Database #4027
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
260979a
blockdb setup & readme
DracoLi 64ca7f1
feat: block db implementation & readme
DracoLi cf35473
refactor: rename store to database
DracoLi 15ae1d1
feat: add tests and update blockdb to have separate methods to read h…
DracoLi c6989b0
feat: data splitting & fix linting
DracoLi c1bcf97
fix: close db before deleting the file
DracoLi 4201549
fix: recovery issues with data files splitting & feedback
DracoLi 9a90669
use lru for file cache and fix recovery issues
DracoLi decbfe8
refactor: use t.TempDir
DracoLi f08b7a7
fix: cache test
DracoLi cb900cf
Merge branch 'master' into dl/blockdb
DracoLi dd98830
refactor: move database methods to database.go
DracoLi e1aa481
rename blockHeader -> blockEntryHeader and improve recovery logic
DracoLi 4107837
make MaxDataFiles configurable
DracoLi 4d2822b
add more logging
DracoLi ef3fbb0
move data and index dir to config and rename config
DracoLi dcde3ee
fix lint
DracoLi 4bf1935
fix struct alignment and add tests
DracoLi 6392142
fix: separate errors for directories
DracoLi 9eb635a
consistent block height tracking
DracoLi 5f018b4
remove truncate config
DracoLi b9b8fa8
add additional tests
DracoLi 80e01d2
fix lint and improve test error msg
DracoLi 4752af3
remove assertion in go routine
DracoLi abce7c2
Merge branch 'master' into dl/blockdb
DracoLi 2530a83
limit concurrent calls to persistIndexHeader
DracoLi c20a237
add warning log if config values differ from index header
DracoLi 68f411f
change warn logs to info
DracoLi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,196 @@ | ||
# BlockDB | ||
|
||
BlockDB is a specialized database optimized for blockchain blocks. | ||
|
||
## Key Functionalities | ||
|
||
- **O(1) Performance**: Both reads and writes complete in constant time | ||
- **Parallel Operations**: Multiple threads can read and write blocks concurrently without blocking | ||
- **Flexible Write Ordering**: Supports out-of-order block writes for bootstrapping | ||
- **Configurable Durability**: Optional `syncToDisk` mode guarantees immediate recoverability | ||
- **Automatic Recovery**: Detects and recovers unindexed blocks after unclean shutdowns | ||
|
||
## Design | ||
|
||
BlockDB uses a single index file and multiple data files. The index file maps block heights to locations in the data files, while data files store the actual block content. Data storage can be split across multiple data files based on the maximum data file size. | ||
|
||
``` | ||
┌─────────────────┐ ┌─────────────────┐ | ||
│ Index File │ │ Data File 1 │ | ||
│ (.idx) │ │ (.dat) │ | ||
├─────────────────┤ ├─────────────────┤ | ||
│ Header │ │ Block 0 │ | ||
│ - Version │ ┌─────>│ - Header │ | ||
│ - Min Height │ │ │ - Data │ | ||
│ - Max Height │ │ ├─────────────────┤ | ||
│ - Data Size │ │ │ Block 1 │ | ||
│ - ... │ │ ┌──>│ - Header │ | ||
├─────────────────┤ │ │ │ - Data │ | ||
│ Entry[0] │ │ │ ├─────────────────┤ | ||
│ - Offset ───────┼──┘ │ │ ... │ | ||
│ - Size │ │ └─────────────────┘ | ||
│ - Header Size │ │ | ||
├─────────────────┤ │ | ||
│ Entry[1] │ │ | ||
│ - Offset ───────┼─────┘ | ||
│ - Size │ | ||
│ - Header Size │ | ||
├─────────────────┤ | ||
│ ... │ | ||
└─────────────────┘ | ||
``` | ||
|
||
### File Formats | ||
|
||
#### Index File Structure | ||
|
||
The index file consists of a fixed-size header followed by fixed-size entries: | ||
|
||
``` | ||
Index File Header (64 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Version │ 8 bytes │ | ||
│ Max Data File Size │ 8 bytes │ | ||
│ Min Block Height │ 8 bytes │ | ||
│ Max Contiguous Height │ 8 bytes │ | ||
│ Max Block Height │ 8 bytes │ | ||
│ Next Write Offset │ 8 bytes │ | ||
│ Reserved │ 16 bytes│ | ||
└────────────────────────────────┴─────────┘ | ||
|
||
Index Entry (16 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Data File Offset │ 8 bytes │ | ||
│ Block Data Size │ 4 bytes │ | ||
│ Header Size │ 4 bytes │ | ||
└────────────────────────────────┴─────────┘ | ||
``` | ||
|
||
#### Data File Structure | ||
|
||
Each block in the data file is stored with a block entry header followed by the raw block data: | ||
|
||
``` | ||
Block Entry Header (26 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Height │ 8 bytes │ | ||
│ Size │ 4 bytes │ | ||
│ Checksum │ 8 bytes │ | ||
│ Header Size │ 4 bytes │ | ||
│ Version │ 2 bytes │ | ||
└────────────────────────────────┴─────────┘ | ||
``` | ||
|
||
### Block Overwrites | ||
|
||
BlockDB allows overwriting blocks at existing heights. When a block is overwritten, the new block is appended to the data file and the index entry is updated to point to the new location, leaving the old block data as unreferenced "dead" space. However, since blocks are immutable and rarely overwritten (e.g., during reorgs), this trade-off should have minimal impact in practice. | ||
|
||
### Fixed-Size Index Entries | ||
|
||
Each index entry is exactly 16 bytes on disk, containing the offset, size, and header size. This fixed size enables direct calculation of where each block's index entry is located, providing O(1) lookups. For blockchains with high block heights, the index remains efficient, even at height 1 billion, the index file would only be ~16GB. | ||
|
||
### Durability and Fsync Behavior | ||
|
||
BlockDB provides configurable durability through the `syncToDisk` parameter: | ||
|
||
**Data File Behavior:** | ||
|
||
- **When `syncToDisk=true`**: The data file is fsync'd after every block write, guaranteeing durability against both process failures and kernel/machine failures. | ||
- **When `syncToDisk=false`**: Data file writes are buffered, providing durability against process failures but not against kernel or machine failures. | ||
|
||
**Index File Behavior:** | ||
|
||
- **When `syncToDisk=true`**: The index file is fsync'd every `CheckpointInterval` blocks (when the header is written). | ||
- **When `syncToDisk=false`**: The index file relies on OS buffering and is not explicitly fsync'd. | ||
|
||
### Recovery Mechanism | ||
|
||
On startup, BlockDB checks for signs of an unclean shutdown by comparing the data file size on disk with the indexed data size stored in the index file header. If the data files are larger than what the index claims, it indicates that blocks were written but the index wasn't properly updated before shutdown. | ||
|
||
**Recovery Process:** | ||
|
||
1. Starts scanning from where the index left off (`NextWriteOffset`) | ||
2. For each unindexed block found: | ||
- Validates the block entry header and checksum | ||
- Writes the corresponding index entry | ||
3. Calculates the max contiguous height and max block height | ||
4. Updates the index header with the updated max contiguous height, max block height, and next write offset | ||
|
||
## Usage | ||
|
||
### Creating a Database | ||
|
||
```go | ||
import ( | ||
"errors" | ||
"github.com/ava-labs/avalanchego/x/blockdb" | ||
) | ||
|
||
config := blockdb.DefaultConfig(). | ||
WithDir("/path/to/blockdb") | ||
db, err := blockdb.New(config, logging.NoLog{}) | ||
if err != nil { | ||
fmt.Println("Error creating database:", err) | ||
return | ||
} | ||
defer db.Close() | ||
``` | ||
|
||
### Writing and Reading Blocks | ||
|
||
```go | ||
// Write a block with header size | ||
height := uint64(100) | ||
blockData := []byte("header:block data") | ||
headerSize := uint32(7) // First 7 bytes are the header | ||
err := db.WriteBlock(height, blockData, headerSize) | ||
if err != nil { | ||
fmt.Println("Error writing block:", err) | ||
return | ||
} | ||
|
||
// Read a block | ||
blockData, err := db.ReadBlock(height) | ||
if err != nil { | ||
if errors.Is(err, blockdb.ErrBlockNotFound) { | ||
fmt.Println("Block doesn't exist at this height") | ||
return | ||
} | ||
fmt.Println("Error reading block:", err) | ||
return | ||
} | ||
|
||
// Read block components separately | ||
headerData, err := db.ReadHeader(height) | ||
if err != nil { | ||
if errors.Is(err, blockdb.ErrBlockNotFound) { | ||
fmt.Println("Block doesn't exist at this height") | ||
return | ||
} | ||
fmt.Println("Error reading header:", err) | ||
return | ||
} | ||
bodyData, err := db.ReadBody(height) | ||
if err != nil { | ||
if errors.Is(err, blockdb.ErrBlockNotFound) { | ||
fmt.Println("Block doesn't exist at this height") | ||
return | ||
} | ||
fmt.Println("Error reading body:", err) | ||
return | ||
} | ||
``` | ||
|
||
## TODO | ||
|
||
- Implement a block cache for recently accessed blocks | ||
- Use a buffered pool to avoid allocations on reads and writes | ||
- Add metrics | ||
- Add performance benchmarks | ||
- Consider supporting missing data files (currently we error if any data files are missing) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.