-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Rationale
The block production mechanism adopts the minimum participation strategy. When the number of SRs is lower than the minimum participation rate, SRs stop producing blocks. The default configuration is minParticipationRate = 15. As long as more than 15% of SRs work normally, SRs can produce blocks as usual.
The block production thread judgment logic is as follows.
int participation = consensusDelegate.calculateFilledSlotsCount();
int minParticipationRate = dposService.getMinParticipationRate();
if (participation < minParticipationRate) {
return State.LOW_PARTICIPATION;
}
Block solidification mechanism: The block can be solidified only after it is confirmed by 70% of SRs, the codes are shown below.
private void updateSolidBlock() {
List<Long> numbers = consensusDelegate.getActiveWitnesses().stream()
.map(address -> consensusDelegate.getWitness(address.toByteArray()).getLatestBlockNum())
.sorted()
.collect(Collectors.toList());
long size = consensusDelegate.getActiveWitnesses().size();
int position = (int) (size * (1 - SOLIDIFIED_THRESHOLD * 1.0 / 100));
long newSolidNum = numbers.get(position);
long oldSolidNum = consensusDelegate.getLatestSolidifiedBlockNum();
if (newSolidNum < oldSolidNum) {
logger.warn("Update solid block number failed, new: {} < old: {}", newSolidNum, oldSolidNum);
return;
}
CommonParameter.getInstance()
.setOldSolidityBlockNum(consensusDelegate.getLatestSolidifiedBlockNum());
consensusDelegate.saveLatestSolidifiedBlockNum(newSolidNum);
logger.info("Update solid block number to {}", newSolidNum);
}
Background
There is a problem between the block producing mechanism and the solidification mechanism. When the number of participating SRs is greater than 15% and less than 70% of the total, SRs can continue to generate blocks. Since the blocks cannot be solidified and are kept in the memory, it may cause the memory to be exhausted.
Experiment Process
The experiment uses 27 mainnet SRs to construct an unfinalized scenario with maximum SR block production.
- Hardware environment: 16 cores, 32G server, 24G heap memory for each SR.
- Network environment: Keep 17 SRs, stop 10 SRs.
- Experiment process:
Starting from block 52873440, massively stress test transactions, up to block 52874137, and found that the SR block producing performance dropped to around 300 txs/block. Continue stress testing and found that the packaging performance continues to decline. Finally, after 5 days, the SR packaging performance became 2-3 txs per block, and eventually, an OOM occurred (SR014 on the morning of July 25).
Experiment Results
Conclusion 1: in the unfinalized scenario, after 697 blocks (840,000 transactions), the SR packaging performance dropped below 300 txs per block.
Conclusion 2: in the unfinalized scenario, the SR packaging performance declines linearly. When the single block packaging performance is 2 transactions, the OOM phenomenon occurs. A total of about 1.5 million transactions were packaged.
If there are too many blocks that cannot be solidified, the chain recovery speed will be plodding due to the need to synchronize a large number of unsolidified blocks after the node is restarted.
The following log comes from an online node.
00:00:00.006 INFO [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50355146, cost/txs: 132/125 false.
00:30:00.146 INFO [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50364955, cost/txs: 201/280 false.
01:00:00.313 INFO [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50370343, cost/txs: 406/448 false.
05:59:59.828 INFO [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50435815, cost/txs: 283/312 false.
12:18:48.894 INFO [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50531191, cost/txs: 123/154 false.
23:59:59.883 INFO [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50692835, cost/txs: 214/278 false.
The synchronization block time statistics are as follows.
cost(hour) | Number of synchronized blocks |
---|---|
0.5 | 9809 |
1 | 15197 |
6 | 80669 |
12 | 176045 |
24 | 337689 |
Implementation
When the number of blocks that cannot be solidified reaches the threshold, transaction broadcasting is stopped to prevent SR from packaging too many transactions that cannot be solidified. This has the following benefits:
- Avoid caching too many transactions that cannot be solidified which may cause memory exhaustion.
- With fewer block transactions, block execution speed will be faster, block synchronization speed will be boosted, and chain recovery will be faster.
- Avoid introducing too much dirty data, making data rollback easier.
The implementation is as follows.
Add solid block check function.
public boolean unsolidifiedBlockCheck() {
if (!unsolidifiedBlockCheck) {
return false;
}
long headNum = chainBaseManager.getHeadBlockNum();
long solidNum = chainBaseManager.getSolidBlockId().getNum();
return headNum - solidNum >= maxUnsolidifiedBlocks;
}
When broadcasting transactions, if the blocks that cannot be solidified reach the threshold, failure information will be returned directly.
if (tronNetDelegate.unsolidifiedBlockCheck()) {
logger.warn("Broadcast transaction {} has failed, block unsolidified.", txID);
return builder.setResult(false).setCode(response_code.BLOCK_UNSOLIDIFIED)
.setMessage(ByteString.copyFromUtf8("Bock unsolidified."))
.build();
}
When processing the inventory message, if the blocks that cannot be solidified reach the threshold, the message will no longer be processed.
if (type.equals(InventoryType.TRX) && tronNetDelegate.unsolidifiedBlockCheck()) {
logger.warn("Drop inv: {} size: {} from Peer {}, block unsolidified",
type, size, peer.getInetAddress());
return false;
}