FIP: Cross-shard communication in Snapchain #241
aditiharini
announced in
FIP Stage 3: Review
Replies: 2 comments 3 replies
-
How about using onchain events? |
Beta Was this translation helpful? Give feedback.
3 replies
-
Reading this and FIP-240 again. I realize that what triggered both is design choices related to fnames. What exactly? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
There are cases where we want some kind of cross shard communication in snapchain. Most importantly for Storage Delegation. There are cases where a user in one shard wants to delegate to a user in another shard. Cross shard communication is also required to process fname transfers correctly (currently we work around by every shard having a copy of all fname transfers). This would also allow us to remove the fname server and bring it into snapchain in the future.
Design
A simple approach to cross shard communication is to have all global data stored in shard 0, and have it manage sending signed messages to the individual shards to let them know to change their state. Note this is not a true “cross shard” solution, in that shards don’t directly talk to each other. Communication relies on the shard 0 forwarding all messages to the right shards. It is possible to extend this to do actual shard-to-shard messaging, it would require shard 0 to be a hop in the middle. We don’t currently anticipate any need to do this kind of communication.
At a high level,
This design is meant for low throughput and latency tolerant operations. It does not work for cases where cross shard transfers happen very frequently or need to be fast (e.g. block on being able to reply to a cast until a the cast is included in another shard). It increases confirmation times of cross shard messages to 3 blocks across all shards in the ideal case. But shards still run independently and it does not impact normal messages.
Specification
Prerequisite
Currently shard 0 waits for the matching block number on each shard. This means shard 0 runs at the speed of the slowest shard and is therefore behind shards which have higher block numbers. This means shard 0 does not have a view of the latest state of each shard. We need to fix that so we can enable cross shard communication.
Shard 0 should now wait upto 500ms (block_time / 2) to receive a confirmed block for both shards. If it received a block and it’s higher than the previously witnessed blocks then it will validate and commit the headers into it’s own block.
If it did not receive a block from at least one of the shards, shard 0 will still commit the rest of the blocks it witnessed. If it received no blocks, it will commit an empty witness.
At a future point, when it receives the missing blocks, it will pick the latest block number it have received for the shard and commit that header. It will validate all the intermediate blocks for that shard. So shard 0 witnesses serve as the high watermark proof that all block numbers below that height are considered valid even if they may be missing.
If a block is missing or invalid after a certain period of time (10 mins?), then shard 0 must halt. This is to ensure that other shards will not receive out of date cross shard messages while one or more shards are down.
Protobuf changes
BlockEngine Changes
Shard 0 needs a MerkleTrie and an on chain event store. It needs to know about valid signers so it can validate the StorageLend user messages. Once these are added, the onchain events have to be backfilled on this shard.
When Shard 0 receives a StorageLend messages, it will process it similar to how stores work on the other shards. The main difference is these messages will emit a
BlockEvent
instead of aHubEvent
and these block events will be included directly in the block. Once a block is decided, if it contains a block event, these blocks will be queued for inclusion into the shard’s mempool.When shard 0 processes commits from the other shards, if it has previously submitted a block into the mempool, it will monitor the decided blocks to make sure that the block is present in the transactions list. Once the block is confirmed to have been included in all the shards, only then will the next shard 0 block be published to the mempool. This ensures that the shards receive the block events in order.
ShardEngine Changes
When the ShardEngine pulls transactions from the mempool, if it sees a block from shard 0 in the system messages, it first validates the block hash and the signatures to ensure it’s a valid block. Then, it processes the BlockEvents from the block in order, before processing any user messages.
Pros and Cons
Pro
Cons
Rollout plan
This is a complicated change that will require multiple protocol releases. We would need to break it up into:
Future Work
Once we have this in place, we could move fname transfers into Shard 0 as well and remove the requirement for a separate FName server.
It might also be worth considering if we can move all OnchainEvents to Shard 0 and propogate changes via the messages mechanism instead of having each shard listen to onchain state changes.
Alternatives Considered
Passing Merkle Proofs
This is the more standard approach. We could do this, but it’s more work and we have to design the stores to merkleize all state, which we don’t do currently. Since we’re going to prune blocks anyway, it should be acceptable to put the events into the block. They will be cleaned up eventually.
Design
To implement this, we’d need to merkleize all state on shard 0 that we’d need to communicate. Easiest way to do this would be to re-use the merkle trie. This would mean creating a new store type that uses the mekle trie and all puts will insert or update the trie. Deletes would remove from the trie.
Then, when another shard needs to know about state in the shard 0, we’d pass the block header (which contains the commit certificates and the shard root), the value it needs to know about, and the path/sibling hashes of the value up to the root. The shard would validate the proof by:
At this point the data is considered valid and the shard can perform operations on the data.
Downsides
Replicate state across all shards
Design
For messages that are relevant to multiple shards (e.g. fname transfers from one fid to another, storage lending from one fid to another) send messages to all shards in the mempool.
Problems
Beta Was this translation helpful? Give feedback.
All reactions