16 posts tagged with "benchmarking-reports"

Benchmarking -- Node 10.5.0

July 2, 2025 · 4 min read

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.4.1 - baseline from the previous Node release
10.5.0 - the current (pre-)release tag

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

Preliminaries

The feature in 10.5 with major performance impact is periodic ledger metrics. This is exclusive to the new tracing system.
10.5 flips the default config for PeerSharing to true; however, the recommendation is to explicitly set it to false on block producers. If not for privacy issues alone, we also found disadvantageous performance impact on block production when enabled. Hence, our benchmarks do not factor in that overhead.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

10.5.0 shows a clear reduction in CPU usage - by ~30% regardless of workload type.
Furthermore, Allocation rate and GC impact are clearly reduced - by 27%-29% and 24%-25% respectively.
Heap size increases very slightly under saturation (by 1%) and decreases very slightly (by 1%) under Plutus workload.
CPU 85% spans are slightly shorter (~0.2 slots) under saturation, and slightly longer (~0.26 slots) under Plutus workload.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Block Context Acquisition time (prior to leadership check) is greatly reduced - from ~24ms to under 1ms.
Under saturation only, Ledger Ticking and Mempool Snapshotting exhibit very slight upticks (by 3ms and 2ms respectively).
Under Plutus workload only, Self Adoption on the forger exhibits a very slight uptick (by 3ms).
In summary, a block producer is able to announce a new header 20ms or 21% earlier into the slot (22ms or 43% under Plutus workload).

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Under saturation workload only, Block Fetch duration increases by 14ms (or 4%).
Under saturation, block adoption is slightly faster (by 3ms), while under Plutus workload it's slightly slower (by 2ms).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

Under saturation workload, cluster adoption times on 10.5.0 are identical to those on 10.4.1.
Under Plutus workload, they show a moderate 3% - 5% improvement, with 7% in the 50th percentile.

Conclusion

We could not detect any regressions or performance risks to the network on 10.5.0.
CPU usage is clearly reduced.
The forging loop executes faster, new header announcements happen earlier.
Diffusion / adoption metrics exhibit a small overall improvement and indicate 10.5.0 will deliver network performance at least comparable to 10.4.1.
All improvements listed above hinge on the ledger metrics feature and will materialize only when using the new tracing system. Using the legacy system, 10.5.0 performance is expected to be almost identical to 10.4.1.

Attachments

Full comparison for value-only workload, PDF downloadable here.

Full comparison for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.5.0 extend to a potential 10.5.1 tag, as that won't include any changes with a performance impact; thus, measurements performed on 10.5.0 remain valid.

Benchmarking -- Node 10.4.1

May 5, 2025 · 4 min read

Michael Karg

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.3.1 - baseline from the previous Node release
10.4.1 - the current release

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

10.4.1 features the UTxO-HD in-memory backing store V2InMemory of LedgerDB, which replaces the in-memory representation of UTxO entries in 10.3 and prior.

Observations

Resource Usage

On 10.4.0 under value workload, Heap size increases slightly by 2%, and 5% under Plutus workload. This corresponds to using ~170MiB-390MiB additional RAM.
Allocation rate and GC impact are virtually unchanged.
Process CPU usage improves slightly by 2% regardless of workload type.
CPU 85% spans are slightly (~0.37 slots) longer under value workload, and slightly shorter (~0.33) under Plutus workload.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

We can observe a clear improvement in Mempool snapshotting by 9ms or 16% (2ms or 8% under Plutus workload).
Self-Adoption time improves by 4ms or 5% (and remains virtually unchanged under Plutus workload).
Hence a block producer is able to announce a new header 10ms or 9% earlier into the slot (1ms or 2% under Plutus workload).

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Under value workload, Fetch duration and Fetched to Sending improve slightly by 3ms (1%) and 2ms (4%).
Under Plutus workload, Fetched to Sending has a slightly longer delay - 2ms (or 5%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

Under value workload, cluster adoption times exhibit a small 1% - 3% improvement across all percentiles.
Under Plutus workload, they show a small 1% - 2% increase across all percentiles (except the 80th).

Conclusion

We could not detect any regressions or performance risks to the network on 10.4.1.
There is a small and reasonable price to pay in RAM usage for adding the LedgerDB abstraction and thus enable exchangeable backing store implementations.
On the other hand, CPU usage is reduced slightly by use of the in-memory backing store.
10.4.1 is beneficial in all cases for block production metrics; specifically, block producers will be able to announce new headers earlier into the slot.
Network diffusion and adoption metrics vary only slightly and indicate 10.4.1 will deliver network performance comparable to 10.3.1.

Attachments

Full comparison for value-only workload, PDF downloadable here.

Full comparison for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.4.1 were performed on tag 10.4.0. The patch version bump did not include changes relevant to performance; thus, measurements performed on 10.4.0 remain valid. The same holds for 10.3.1 and 10.3.0.

Benchmarking -- Node 10.3.1

April 22, 2025 · 5 min read

Michael Karg

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 3 different versions of cardano-node:

10.2 - baseline from the previous release (bulit with GHC8.10.7)
10.3.0-ghc8107 - the current release built with GHC8.10.7
10.3.0-ghc965 - the current release built with GHC9.6.5

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

10.3.1 supports two compiler versions, which will be taken into account when comparing performance of different builds of that release.

Observations

Resource Usage

10.3.1 exhibits a clear reduction in Process CPU usage, more prominent under value workload:
- value workload: 10% with GHC8, and 24% with GHC9.
- Plutus workload: 4% GHC8, and 6% with GHC9.
There also is a reduction in RAM usage, more prominent under Plutus workload:
- value workload: 1% or ~54MiB with GHC8, and 6% or ~574MiB with GHC9.
- Plutus workload: 14% or ~1.2GiB with GHC9 only.
Minor GCs and Allocation rate both drop on 10.3.1, more significantly under value workload:
- value workload: 11% each with GHC8, and 24% each with GHC9.
- Plutus workload: 3% and 1% with GHC8; 5% and 4% with GHC9.
Under value workload, CPU 85% spans increase by 45% with GHC8, but only by 14% with GHC9.
Under Plutus workload, those spans decrease by 5% with GHC8; even by 19% with GHC9.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Under value workload, several block production metrics improve clearly on 10.3.1, most prominently Mempool Snapshotting.
With GHC8, the improvement is 23%, with further significant improvements in Adoption time (11%) and Ledger ticking (10%).
With GHC9, the improvement is 27%, with further significant improvements in Adoption time (10%) and Ledger ticking (17%).
Under value workload, this enables a block producer to announce a header earlier into the slot, namely by 23ms (GHC8) and by 28ms (GHC9).
Under Plutus workload, Adoption time increases by 3ms (6%) with GHC8, but decreases by 8ms (15%) with GHC9.
Furthermore, there are no significant changes to the header announcement timing.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Under value-only workload only, we observe an increase in Block Fetch duration: 7ms (2%) with GHC8, and 23ms (7%) with GHC9.
Block adoption times on the peers improve clearly: 11ms (12%) with GHC8, and 12ms (14%) with GHC9.
Under Plutus workload, however, similarly to the block producer, adoption times increase by 3ms (6%) with GHC8, but decrease by 7ms (13%) with GHC9.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

Under value workload, cluster adoption times on 10.3.1 are largely unchanged.
With GHC8, there are 5% and 3% improvements in the 50th and 100th centiles; with GHC9, there's a small 3% improvement in the 50th centile.
Under Plutus workload, with GHC8, there's a moderate 6% increase in cluster adoption times in the 100th centile.
With GHC9, however, there's a small 2% improvement in all but the 100th centile.

Conclusion

For 10.3.1 we could not detect any performance risks or regressions.
Improving resource usage was a stated goal for the 10.3 release; this could be confirmed via measurements for CPU and RAM usage as well as CPU spikes.
10.3.1 achieves network performance comparable to 10.2.1 using clearly less system resources - for both compiler versions.
Several key metrics improve on 10.3.1: Block producers announce a new header sooner into the slot; we observe lower adoption times (GHC9 only).
The GHC9.6.5 build has demonstrable performance advantages over the GHC8.10.7 build; especially the Plutus interpreter seems to gain considerably from using GHC9. For those reasons we now recommend GHC9.6.x for production builds.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.3.1 were performed on tag 10.3.0. The patch version bump did not include changes relevant to performance; thus, measurements performed on 10.3.0 remain valid.

Memory Budget Scaling -- 10.2

April 1, 2025 · 3 min read

Michael Karg

Performance and Tracing Team Lead

Setup

This report compares benchmarking runs for 3 different settings of the Plutus memory execution budget:

loop-memx1 - current mainnet memory execution budget
loop-memx1.5 - 1.5 x current mainnet memory execution budget
loop-memx2 - 2 x current mainnet memory execution budget

For this comparison, we gather various metrics under the Plutus workload used in release benchmarks: Each block produced during the benchmark contains 4 identical script transactions calibrated to fully exhaust the memory execution budget. Thus, script execution is constrained by the memory budget limit every case. The workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. Node version 10.2 was used, built with GHC8.10.7.

Observations

Resource Usage

Scaling the memory budget impacts Allocation Rate and Minor GCs. 1.5 x the budget results in rises of 5% and 6% respecetively; for doubling the budget the corresponding rises are 10% and 11%.
Those increases seem to correlate linearly with raising mem budget.
The effect on CPU usage is almost negligible: a 1% (or 3%, for doubling the budget) increase of Process CPU.
The Node process RAM footprint is unaffected.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Scaling the memory budget has significant impact on block adoption time only.
Scaling by factor 1.5 leads to a 14ms (or 25%) increase, whereas factor 2 leads to 28ms (49%).

Peer propagation

Same as on the block producer, scaling the memory budget has significant impact on block adoption times only.
Scaling by factor 1.5 leads to a 15ms (or 26%) increase, whereas factor 2 leads to 28ms (48%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

1.5 x the memory budget results in a slight increase of 19ms - 22ms in cluster adoption times (4% - 5%).
2 x the memory budget results in a moderate 27ms - 34ms increase (5% - 7%, with 9% in the 50th centile).

Conclusion

These measurements outline the headroom for raising the memory budget, along with the expected performance impact:

Block adoption time is the only metric that's affected significantly, increasing both on the forger and the peers by the same extent.
These increases seem to correspond linearly with the raising the memory budget. This gives excellent predictability of performance impact.
Expectedly, more allocations happen; we can observe the same linear correspondence here as well.
It has to be pointed out that block diffusion is only slightly affected by changing the execution budget: Due to pipelining, announcing and (re-)sending a block precedes adoption in most cases.
As such, regarding absolute cluster adoption times, measurements taken with either budget adjustment do not exhibit performance risks to the network. They do illustrate, however, the performance cost of those budget adjustments.

Attachment

Full report PDF downloadable here.

Memory Budget Scaling -- 10.3

April 1, 2025 · 4 min read

Michael Karg

Performance and Tracing Team Lead

Setup

This report compares benchmarking runs for 3 different settings of the Plutus memory execution budget:

10.3-ghc965 - current mainnet memory execution budget
loop-memx1.5 - 1.5 x current mainnet memory execution budget
loop-memx2 - 2 x current mainnet memory execution budget

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. Node version 10.3 was used, built with GHC9.6.5. This is a re-run of the scaling benchmarks performed on Node version 10.2 / GHC8.10 to document impact of performance improvements. Those results were published here on Cardano Updates.

Observations

Resource Usage

Scaling the memory budget impacts Allocation Rate and Minor GCs. 1.5 x the budget results in rises of 5% each; for doubling the budget the corresponding rises are 8% and 9%.
Those increases seem to correlate linearly with raising mem budget.
The effects on CPU usage and RAM footprint are negligible for both scaling factors.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Scaling the memory budget has significant impact on block adoption time only.
Scaling by factor 1.5 leads to a 10ms (or 24%) increase, whereas factor 2 leads to 21ms (50%).

Peer propagation

Same as on the block producer, scaling the memory budget has significant impact on block adoption times only.
Scaling by factor 1.5 leads to a 11ms (or 24%) increase, whereas factor 2 leads to 19ms (42%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

1.5 x the memory budget results in a slight increase of 9ms - 31ms in cluster adoption times (3% - 6%).
2 x the memory budget results in a moderate 17ms - 35ms increase (5% - 7%).

Conclusion

These measurements outline the headroom for raising the memory budget, along with the expected performance impact:

Block adoption time is the only metric that's affected significantly, increasing both on the forger and the peers by the same extent.
These increases seem to correspond linearly with the raising the memory budget. This gives excellent predictability of performance impact.
Expectedly, more allocations happen; we can observe the same linear correspondence here as well.
It has to be pointed out that block diffusion is only slightly affected by changing the execution budget: Due to pipelining, announcing and (re-)sending a block precedes adoption in most cases.
As such, regarding absolute cluster adoption times, measurements taken with either budget adjustment do not exhibit performance risks to the network. They do illustrate, however, the performance cost of those budget adjustments.

These scaling benchmarks are complementary to those performed on Node 10.2; in comparison with those, we can additionally conclude:

The conclusions from measurements for each scaling run set are identical.
While the relative increases in adoption time for both Node builds are quite similar, the absolute increases are 22% - 32% smaller (i.e., adoption happens more efficiently) for 10.3 / GHC9.6.
The same rationale applies to end-to-end propagation metrics: Absolute values document faster cluster adoption for 10.3 / GHC9.6.
Incidentally, the absolute values for scaling factor 1 on 10.2 are close to those for scaling factor 2 on 10.3 except for the tail end (i.e. 95th percentile and above).
This reflects the performance improvements that were a stated goal for the 10.3 release - and suggests the performance cost of memory budget increases has become slightly smaller in absolute terms.

As adoption times are not only impacted by Plutus execution alone, we still advocate for a conservative and/or multi-stage raise; future backpedaling on budget limits could cause issues for scripts already deployed.

Attachment

Full report PDF downloadable here.

Benchmarking -- UTxO-HD on 10.2

February 21, 2025 · 3 min read

Michael Karg

Performance and Tracing Team Lead

Setup

This report compares benchmarking runs for 2 different flavours of cardano-node:

10.2-regular - regular Node performance baseline from the 10.2.x release benchmarks.
10.2-utxohd - the UTxO-HD build of the Node based on that same version.

For this benchmark, we're gathering various metrics under the value-only workload used in release benchmarks: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively. Moreover, it's the workload that produces most stress on the UTxO set. Thus, it's the most meaningful workload when it comes to benchmarking UTxO-HD.

We target the in-memory backing store of UTxO-HD - LedgerDB V2 in this case. The on-disk backend is not used.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology.

Observations

Resource Usage

With UTxO-HD's in-memory backend, the memory footprint increases slightly by 3%.
Process CPU usage is moderately reduced by 9% with UTxO-HD.
Additionally, CPU 85% spans decrease in duration by 24% (~1.1 slots).

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Block context acquisition improves by 3ms (or 11%), while Ledger ticking takes 3ms (or 10%) longer.
Creating a mempool snapshot is significantly faster - by 16ms (or 21%).
As a result, a UTxO-HD block producing node is able to announce a new header 17ms (or 12%) earlier into a slot.
Additionally, adoption time on the forger is slightly improved - by 4ms (or 5%).

Peer propagation

Block fetch duration increases moderately by 13ms or 4%.
Adoption times on the peers improve very slightly - by 2ms or 2%.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

There is no significant difference in cluster adoption times between regular and UTxO-HD node.

Conclusion

Regarding the UTxO-HD build using the in-memory LedgerDB V2 backend, we can conclude that:

it is lighter on CPU usage compared to the regular node, albeit requiring just slightly more RAM.
it poses no performance risk to block producers; on the contrary, the changes in forging loop metrics seem favourable compared to the regular node.
network performance would be expeceted to be on par with the regular node.
even under stress, there is no measurable performance regression compared to the regular node.
as a consequence of the above, performance-wise, it's a viable replacement for the regular in-memory solution.

Attachment

Full report for value-only workload, PDF downloadable here.

Benchmarking -- Node 10.2.1

February 21, 2025 · 3 min read

Michael Karg

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.1.4 - baseline from a previous mainnet release
10.2.1 - the current release

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Observations

Resource Usage

CPU usage increases moderately by 12% under value, and very slightly by 2% under Plutus workload.
CPU 85% spans increase by 14% (~0.6 slots) under value workload, but decrease by 6% (~0.8 slots) under Plutus workload.
Only under value workload, we observe a slight increase in Allocation rate and Minor GCs of 9% and 8%

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Adoption time on the forger improves by 3ms (or 4%) - and 5ms (or 9%) under Plutus workload.
Block context acquisition takes 3ms (or 12%) longer under value workload.
Under Plutus workload only, ledger ticking improves by 3ms (or 12%).

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Block fetch duration improves clearly by 16ms (or 4%) under value-only workload.
Under Plutus workload, we can measure an improvement by 4ms (or 7%) for adoption times on the peers.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

As a result of the above, on 10.2.1 exhibits:

a slight 3% improvement in cluster adoption times in the 80th centile and above under value workload.
a near-jitter 1% - 3% improvement in cluster adoption times under Plutus workload.

Conclusion

We could not detect any significant regressions, or performance risks, on 10.2.1.
10.2.1 comes with slightly increased CPU usage, and no changes to RAM footprint.
Diffusion metrics very slightly improve - mainly due to block fetch being more efficient for full blocks, and adoption for blocks exclusively containing Plutus transactions.
This points to network performance of 10.2.1 being on par with or very slightly better than 10.1.4.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.2.1 were performed on tag 10.2.0. The patch version bump did not include changes relevant to performance; thus, measurements and observations performed on 10.2.0 remain valid.

Benchmarking -- Node 10.1.4

January 10, 2025 · 3 min read

Michael Karg

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

10.1.1 - baseline from a previous mainnet release
10.1.4 - the current mainnet release

For this benchmark, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Observations

Resource Usage

CPU 85% spans slightly increase by 6% or ~0.2 slots (26% or ~2.9 slots under Plutus workload).
We can observe a tiny increase in memory usage by 1-2% (132-160 MiB).

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Under value workload, Ledger Ticking and Self Adoption exhibit a very slight increase (2ms each).
Block Context Acquisition has improved by 2ms.
Under Plutus workload, there are no significant changes to forger metrics.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

There's a minor increase of 1% (3ms) in Block Fetch duration under value workload only.
Under Plutus workload, we can measure a small improvement by 2% for adoption times on the peers.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

As a result of the above, on 10.1.4 we can observe:

a tiny increase in cluster adoption times of 1%-2% in the 80th centile and above under value workload.
an improvement in cluster adoption times of 3%-4% in the tail end (95th centile and above) under Plutus workload.

Conclusion

For 10.1.4, we could not detect any regressions or performance risks.
All increases or decreases in forger and peer metrics are 3ms and less. This indicates network performance of 10.1.4 will very closely match that of 10.1.1 and subsequent patch releases.
There's no significant change in the resource usage pattern. The increased CPU 85% spans tend to barely manifest when the system is under heavy load (value workload); as such, they pose no cause for concern.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.1.1 were performed on tag 10.0.0-pre. The minor version bump did not include changes relevant to performance; thus, measurements taken on 10.0.0-pre remain a valid baseline.

Benchmarking -- Node 10.1.1

October 31, 2024 · 4 min read

Michael Karg

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

9.2.0 - baseline from a previous mainnet release
10.1.1 - the current mainnet release

For this benchmark, we're gathering various metrics under 3 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.
value+voting: On top of above value workload, this one has DReps vote on and ratify governance actions - forcing additional computation for vote tallying and proposal enactment.

Observations

Resource Usage

10.1.1 shows an improvement of 4% (8% under Plutus workload) in Process CPU usage.
Allocation Rate improves by 8% (11% under Plutus workload), while Heap Size remains unchanged.
CPU 85% spans decrease by 18% (5% under Plutus workload).
Compared to value-only workload, ongoing voting leads to a slight increase of 5% in Process CPU usage.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

Under Plutus workload, 10.1.1 exhibits a formidable speedup of 70ms in the forging loop - due to mempool snapshots being produced much more quickly.
Under value workload, there are no significant changes to forger metrics.
With voting added on top of the value workload, we can observe mempool snapshots and adoption time on the block producer rise by 10ms each.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

Block Fetch duration increases slightly by 16ms (or 5%) under value workload.
Under Plutus workload, there are no significant changes to peer-related metrics.
With the additional voting workload, peer adoption times rise by 12ms on average - confirming the observation for adoption time on the block producer.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

10.1.1 exhibits a slight increase of 2% - 3% in cluster adoption times under value workload.
Under Plutus workload however, we observe significant improvement of 18% up to the 50th centile, and 9% - 13% in the 80th centile and above.
While the former is due to slightly increased Block Fetch duration, the latter is the consequence of much quicker mempool snapshots involving Plutus transactions.
Submitting the additional voting workload, we can observe a consistent 4% - 6% increase in cluster adoption times across all centiles.

Conclusion

We do not detect any perfomance regression in 10.1.1 compared to 9.2.0.
To the contrary - 10.1.1 is lighter on the Node process resource usage overall.
Improved forging and diffusion timings can be expected for blocks heavy on Plutus transactions.
Stressing the governance / voting capabalities of the Conway ledger lets us ascertain an (expected) performance cost of voting.
This cost has demonstrated to be reasonable, and to not contain lurking perfomance risks to the system.
It is expected to manifest only during periods of heavy vote tallying / proposal enactment, slightly affecting block adoption times.

NB. The same amount of DReps are registered for each workload. However, only under value+voting do they become active by submitting votes. This requires an increased UTxO set size, so it uses a baseline seperate from value-only, resulting in slightly different absolute values.

Contact

As for publishing such benchmarking results, we are aware that more context and detail may be needed with regard to specfic metrics or benchmarking methodology.

We are still looking to gather questions, both general and specific, so that we can provide a suitable FAQ and possibly improve presentation in the future.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

Full report for value+voting workload, PDF downloadable here.

NB. The release benchmarks for 10.1.1 were performed on tag 10.0.0-pre. The minor version bump did not include changes relevant to performance; thus, measurements taken on 10.0.0-pre remain valid.

Benchmarking -- Node 8.9.0

March 13, 2024 · 3 min read

Michael Karg

Performance and Tracing Team Lead

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 3 different versions of cardano-node:

8.7.2 - baseline for previous mainnet release
8.8.0 - an intermediate reference point
8.9.0 - the next mainnet release

For each version, we're gathering various metrics under 2 different workloads:

value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Observations

The observations stated refer to the direct comparison between the 8.7.2 and 8.9.0 versions.

Resource Usage

Overall CPU usage exhibits a small to moderate (5% - 8%) increase.
Memory usage is very slightly decreased by 1%.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

For full blocks, Mempool Snapshotting improves by 4% (or 3ms).
For small blocks, Self Adoption times improve by 8% (or 4ms).
All other forger metrics do not exhibit significant change.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

For full blocks, Block Fetch duration shows a notable improvement by 10ms (or 3%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

End-to-end propagation times on 8.9.0 exhibit a small improvement by 2% across all centiles for full blocks, whereas they remain largely unchanged for small blocks.

Conclusion

The performance changes observed between 8.9.0 and 8.7.2 are only minor - with 8.9.0 slightly improving on 8.7.2. Therefore, we'd expect 8.9.0 Mainnet performance to be akin to 8.7.2.
We have demonstrated no performance regression has been introduced in 8.9.0.

Contact

As for publishing such benchmarking results, we are aware that more context and detail may be needed with regard to specfic metrics or benchmarking methodology.

We are still looking to gather questions, both general and specific, so that we can provide a suitable FAQ and possibly improve presentation in the future.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. Mainnet release 8.7.3 did not include any performance-related changes; measurements taken on 8.7.2 remain valid.

Setup​

Preliminaries​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachment​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachment​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachment​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Attachments​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Conclusion​

Contact​

Attachments​

Setup​

Observations​

Resource Usage​

Forging Loop​

Peer propagation​

End-to-end propagation​

Setup

Preliminaries

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachments

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachments

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachments

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachment

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachment

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachment

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachments

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Attachments

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation

Conclusion

Contact

Attachments

Setup

Observations

Resource Usage

Forging Loop

Peer propagation

End-to-end propagation