Benchmarking -- Node 10.4.1 | Cardano Development Updates

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.3.1 - baseline from the previous Node release
10.4.1 - the current release

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

10.4.1 features the UTxO-HD in-memory backing store V2InMemory of LedgerDB, which replaces the in-memory representation of UTxO entries in 10.3 and prior.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

On 10.4.0 under value workload, Heap size increases slightly by 2%, and 5% under Plutus workload. This corresponds to using ~170MiB-390MiB additional RAM.
Allocation rate and GC impact are virtually unchanged.
Process CPU usage improves slightly by 2% regardless of workload type.
CPU 85% spans are slightly (~0.37 slots) longer under value workload, and slightly shorter (~0.33) under Plutus workload.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

We can observe a clear improvement in Mempool snapshotting by 9ms or 16% (2ms or 8% under Plutus workload).
Self-Adoption time improves by 4ms or 5% (and remains virtually unchanged under Plutus workload).
Hence a block producer is able to announce a new header 10ms or 9% earlier into the slot (1ms or 2% under Plutus workload).

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Under value workload, Fetch duration and Fetched to Sending improve slightly by 3ms (1%) and 2ms (4%).
Under Plutus workload, Fetched to Sending has a slightly longer delay - 2ms (or 5%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

Under value workload, cluster adoption times exhibit a small 1% - 3% improvement across all percentiles.
Under Plutus workload, they show a small 1% - 2% increase across all percentiles (except the 80th).

Conclusion

We could not detect any regressions or performance risks to the network on 10.4.1.
There is a small and reasonable price to pay in RAM usage for adding the LedgerDB abstraction and thus enable exchangeable backing store implementations.
On the other hand, CPU usage is reduced slightly by use of the in-memory backing store.
10.4.1 is beneficial in all cases for block production metrics; specifically, block producers will be able to announce new headers earlier into the slot.
Network diffusion and adoption metrics vary only slightly and indicate 10.4.1 will deliver network performance comparable to 10.3.1.

Attachments

Full comparison for value-only workload, PDF downloadable here.

Full comparison for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.4.1 were performed on tag 10.4.0. The patch version bump did not include changes relevant to performance; thus, measurements performed on 10.4.0 remain valid. The same holds for 10.3.1 and 10.3.0.

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​