Benchmarking -- Node 10.7.1
Setup
As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:
10.6.2- the current Node 10.6 performance baseline10.7.1- the latest Node 10.7 release
For this benchmark, we're gathering various metrics under 2 different workloads:
- value-only: Each txn consumes 2 inputs and creates 2 outputs, changing the UTxO set. Full blocks (> 80kB) exclusively; high submission pressure (TPS > 10).
- Plutus: Each txn contains a Plutus script exhausting the per-tx execution budget. Small blocks (< 3kB) exclusively; low submission pressure (TPS < 1).
Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era using the in-memory LedgerDB backend.
Observations
These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.
Resource Usage
10.7.1exhibits a massive reduction in Process CPU usage -- 46% under saturation workload, and 71% under Plutus workload.- Allocation rate and Minor GCs increase moderately (by 14% - 16%) under saturation workload; however, GC CPU usage still shows a 4% improvement.
- Under Plutus workload, they both slightly decrease (by 4%), and GC CPU usage improves by 16%.
- Major GC events occur less frequently, by 16% - 18%, depending on workload.
- Observed CPU 85% spans exhibit a decrease in duration as well -- ~1.3 - ~1.4 slots depending on workload.
- Average RAM usage exhibits a slight increase by 2% for either workload.
Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.
Anomaly control
- Under saturation, Height & Slot battles on
10.7.1occur 12% less frequently.
Forging Loop
- Under saturation workload, there are no significant changes in block production metrics.
- Under Plutus workload, there are various small improvements in Ledger Ticking (4ms), Self Adoption (2ms) and Forged to Sending (1ms).
- As a result, a block producer is able to announce a new header 3ms or 13% earlier into a slot (Plutus workload only).
The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.
Peer propagation
- Under saturation workload, Block Fetch duration decreases by 7ms or 2%.
- Average Adoption times on the peers improve by 2ms / 3ms (3% / 8%), depending on workload.
End-to-end propagation
This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.
- Cluster adoption metrics on
10.7.1exhibit a small but consistent improvement by 2% - 3% for either workload in the 80th centile and above.
Conclusion
10.7.1contains major optimizations in Ledger and Consensus, as well as various smaller ones such as in the metrics system. The large improvement in CPU usage confirms their effectiveness.- The benchmarks are designed to amplify trends. Managing expectations, the observed improvements will be somewhat less prominent on Mainnet; furthermore, some of the optimizations target block producers only.
- A space leak that was present on
10.7.0is now provably absent from10.7.1. 10.7.1is a small, but consistent improvement over10.6.2as far as block production, diffusion and adoption metrics are concerned.- Conversely, not a single performance regression could be observed on
10.7.1.
Attachments
Full comparison for value-only workload, PDF downloadable here.
Full comparison for Plutus workload, PDF downloadable here.
