Benchmarking -- Node 10.6.2 | Cardano Development Updates

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.5.4 - the current Node 10.5 Mainnet release
10.6.2 - the current Node 10.6 Mainnet release

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each txn consumes 2 inputs and creates 2 outputs, changing the UTxO set. Full blocks (> 80kB) exclusively; high submission pressure (TPS > 10).
Plutus: Each txn contains a Plutus script exhausting the per-tx execution budget. Small blocks (< 3kB) exclusively; low submission pressure (TPS < 1).

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

10.6.2 exhibits a clear 6% reduction in Process CPU usage under full saturation workload, and a 2% reduction under Plutus workload.
Allocation rate and Minor GCs are significantly reduced as well (~59% and 64% depending on workload).
Major GC events go up by 27% and 34%, depending on workload.
Observed CPU 85% spans exhibit a clear increase in duration -- ~4.1 and ~1.8 slots depending on workload.
RAM usage decreases by 19% and 24% depending on workload. HOWEVER: This is a known result of optimizations in the benchmarking setup. From a seperate benchmark with those optimizations applied on top of 10.5, we know that 10.6.2 exhibits a very minor increase (1% - 2%) in average Heap size, with a reduction in maximum Heap size under saturation workload only.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

We can observe Ledger Ticking decrease by 3ms - 5ms.
Under saturation workload only, Mempool snapshotting and Forged to Sending exhibit small increases by 2ms each.
As a result, a block producer is able to announce a new header 2ms - 3ms earlier into a slot (depending on workload).
Self adoption on the forging node also decreases by 2ms - 3ms.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Block Fetch duration decreases by 2% (3ms or 5ms, depending on workload).
For small blocks, Adoption times on the peers increase by 2ms, however, for large blocks they decrease by 3ms.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

Cluster adoption metrics on 10.6.2 exhibit no significant change under high submission / large blocks workload.
Under low submission / small blocks workload, there are small improvements of 1% - 3% in the body of the distribution, and a 10% increase in the 100th centile.

Conclusion

10.6.2 is more efficient in its usage of CPU time. Considered in conjunction with the increase in CPU 85% spans, it points to a redistribution of work which results in less bursts, and more plateaus.
Seeing there is no negative impact on block production and diffusion metrics, and considering the overall decrease in CPU usage, the CPU 85% span increase can't be interpreted as a risk to performance or responsiveneess.
The RAM increase observed on 10.6.0-pre has been successfully fixed in 10.6.2.
10.6.2 is more efficient wrt. block production and diffusion.
Adoption metrics are largely equivalent to 10.5.4; the 10% increase in the Plutus workload's 100th centile is an outlier resulting from the benchmark's very restrictive topology. Unless the increase also manifests in the 98th and 96th centiles, or below, it is not considered a risk.
From a performance perspective, we can determine 10.6.2 to be regression-free and attest a clean bill of health.

Attachments

Full comparison for value-only workload, PDF downloadable here.

Full comparison for Plutus workload, PDF downloadable here.

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​