Benchmarking -- Node 10.6.0-pre | Cardano Development Updates

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.5 - baseline from the previous Node release
10.6.0-pre - the current (pre-)release tag

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

10.6.0-pre exhibits a slight shift in CPU usage. It consumes 3% less CPU time under saturation, whereas under a low submission workload it consumes 4% more.
Allocation rate and Minor GCs impact are significantly reduced - (~45% and ~59% depending on workload). This takes much pressure away from the garbage collector.
RAM usage increases significantly by 0.9GiB - 1.1GiB (15% - 17% depending on workload).
Observed CPU 85% spans are longer -- ~3.1 slots under value and ~1.6 slots under Plutus workload.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

We can observe slight increases in Ledger Ticking and Mempool Snapshotting by 2ms each.
This causes a block producer to announce a new header 3ms - 4ms (or 6% - 12%) later into a slot.
Additionally, Adoption time on the block producer also increases by 3ms - 4ms.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Under saturation workload only, Block Fetch duration increases by 7ms (or 2%).
Adoption times on the peers increase by 2ms - 3ms.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

Cluster adoption metrics on 10.6.0-pre exhibit a small 2% - 3% increase across all centiles.
Under Plutus workload only, the increase becomes superlinear in 98th-100th centiles, with an extra 89ms (or 18%) required for full cluster adoption.

Conclusion

The small increases in block production, diffusion and adoption metrics do not pose a performance risk to the network.
The restricted topology of the benchmark forces a regression in the tail end of the adoption metrics distribution to surface; in a live network, this is mitigated by a much higher number of connected peers / peer sharing.
The increase in RAM usage has so far not manifested on relay nodes deployed in a live network; as this is a pre-relase, the precise effect of our benchmark (exposing block producers to high pressure over extended time) is under investigation.

Attachments

Full comparison for value-only workload, PDF downloadable here.

Full comparison for Plutus workload, PDF downloadable here.

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​