Skip to main content

· 4 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

  • 10.3.1 - baseline from the previous Node release
  • 10.4.1 - the current release

For this benchmark, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

10.4.1 features the UTxO-HD in-memory backing store V2InMemory of LedgerDB, which replaces the in-memory representation of UTxO entries in 10.3 and prior.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. On 10.4.0 under value workload, Heap size increases slightly by 2%, and 5% under Plutus workload. This corresponds to using ~170MiB-390MiB additional RAM.
  2. Allocation rate and GC impact are virtually unchanged.
  3. Process CPU usage improves slightly by 2% regardless of workload type.
  4. CPU 85% spans are slightly (~0.37 slots) longer under value workload, and slightly shorter (~0.33) under Plutus workload.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. We can observe a clear improvement in Mempool snapshotting by 9ms or 16% (2ms or 8% under Plutus workload).
  2. Self-Adoption time improves by 4ms or 5% (and remains virtually unchanged under Plutus workload).
  3. Hence a block producer is able to announce a new header 10ms or 9% earlier into the slot (1ms or 2% under Plutus workload).

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. Under value workload, Fetch duration and Fetched to Sending improve slightly by 3ms (1%) and 2ms (4%).
  2. Under Plutus workload, Fetched to Sending has a slightly longer delay - 2ms (or 5%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

  1. Under value workload, cluster adoption times exhibit a small 1% - 3% improvement across all percentiles.
  2. Under Plutus workload, they show a small 1% - 2% increase across all percentiles (except the 80th).

Conclusion

  1. We could not detect any regressions or performance risks to the network on 10.4.1.
  2. There is a small and reasonable price to pay in RAM usage for adding the LedgerDB abstraction and thus enable exchangeable backing store implementations.
  3. On the other hand, CPU usage is reduced slightly by use of the in-memory backing store.
  4. 10.4.1 is beneficial in all cases for block production metrics; specifically, block producers will be able to announce new headers earlier into the slot.
  5. Network diffusion and adoption metrics vary only slightly and indicate 10.4.1 will deliver network performance comparable to 10.3.1.

Attachments

Full comparison for value-only workload, PDF downloadable here.

Full comparison for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.4.1 were performed on tag 10.4.0. The patch version bump did not include changes relevant to performance; thus, measurements performed on 10.4.0 remain valid. The same holds for 10.3.1 and 10.3.0.

· 5 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 3 different versions of cardano-node:

  • 10.2 - baseline from the previous release (bulit with GHC8.10.7)
  • 10.3.0-ghc8107 - the current release built with GHC8.10.7
  • 10.3.0-ghc965 - the current release built with GHC9.6.5

For this benchmark, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

10.3.1 supports two compiler versions, which will be taken into account when comparing performance of different builds of that release.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. 10.3.1 exhibits a clear reduction in Process CPU usage, more prominent under value workload:
    • value workload: 10% with GHC8, and 24% with GHC9.
    • Plutus workload: 4% GHC8, and 6% with GHC9.
  2. There also is a reduction in RAM usage, more prominent under Plutus workload:
    • value workload: 1% or ~54MiB with GHC8, and 6% or ~574MiB with GHC9.
    • Plutus workload: 14% or ~1.2GiB with GHC9 only.
  3. Minor GCs and Allocation rate both drop on 10.3.1, more significantly under value workload:
    • value workload: 11% each with GHC8, and 24% each with GHC9.
    • Plutus workload: 3% and 1% with GHC8; 5% and 4% with GHC9.
  4. Under value workload, CPU 85% spans increase by 45% with GHC8, but only by 14% with GHC9.
  5. Under Plutus workload, those spans decrease by 5% with GHC8; even by 19% with GHC9.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Under value workload, several block production metrics improve clearly on 10.3.1, most prominently Mempool Snapshotting.
  2. With GHC8, the improvement is 23%, with further significant improvements in Adoption time (11%) and Ledger ticking (10%).
  3. With GHC9, the improvement is 27%, with further significant improvements in Adoption time (10%) and Ledger ticking (17%).
  4. Under value workload, this enables a block producer to announce a header earlier into the slot, namely by 23ms (GHC8) and by 28ms (GHC9).
  5. Under Plutus workload, Adoption time increases by 3ms (6%) with GHC8, but decreases by 8ms (15%) with GHC9.
  6. Furthermore, there are no significant changes to the header announcement timing.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. Under value-only workload only, we observe an increase in Block Fetch duration: 7ms (2%) with GHC8, and 23ms (7%) with GHC9.
  2. Block adoption times on the peers improve clearly: 11ms (12%) with GHC8, and 12ms (14%) with GHC9.
  3. Under Plutus workload, however, similarly to the block producer, adoption times increase by 3ms (6%) with GHC8, but decrease by 7ms (13%) with GHC9.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

  1. Under value workload, cluster adoption times on 10.3.1 are largely unchanged.
  2. With GHC8, there are 5% and 3% improvements in the 50th and 100th centiles; with GHC9, there's a small 3% improvement in the 50th centile.
  3. Under Plutus workload, with GHC8, there's a moderate 6% increase in cluster adoption times in the 100th centile.
  4. With GHC9, however, there's a small 2% improvement in all but the 100th centile.

Conclusion

  1. For 10.3.1 we could not detect any performance risks or regressions.
  2. Improving resource usage was a stated goal for the 10.3 release; this could be confirmed via measurements for CPU and RAM usage as well as CPU spikes.
  3. 10.3.1 achieves network performance comparable to 10.2.1 using clearly less system resources - for both compiler versions.
  4. Several key metrics improve on 10.3.1: Block producers announce a new header sooner into the slot; we observe lower adoption times (GHC9 only).
  5. The GHC9.6.5 build has demonstrable performance advantages over the GHC8.10.7 build; especially the Plutus interpreter seems to gain considerably from using GHC9. For those reasons we now recommend GHC9.6.x for production builds.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.3.1 were performed on tag 10.3.0. The patch version bump did not include changes relevant to performance; thus, measurements performed on 10.3.0 remain valid.

· 3 min read
Michael Karg

Setup

This report compares benchmarking runs for 3 different settings of the Plutus memory execution budget:

  • loop-memx1 - current mainnet memory execution budget
  • loop-memx1.5 - 1.5 x current mainnet memory execution budget
  • loop-memx2 - 2 x current mainnet memory execution budget

For this comparison, we gather various metrics under the Plutus workload used in release benchmarks: Each block produced during the benchmark contains 4 identical script transactions calibrated to fully exhaust the memory execution budget. Thus, script execution is constrained by the memory budget limit every case. The workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. Node version 10.2 was used, built with GHC8.10.7.

Observations

Resource Usage

  1. Scaling the memory budget impacts Allocation Rate and Minor GCs. 1.5 x the budget results in rises of 5% and 6% respecetively; for doubling the budget the corresponding rises are 10% and 11%.
  2. Those increases seem to correlate linearly with raising mem budget.
  3. The effect on CPU usage is almost negligible: a 1% (or 3%, for doubling the budget) increase of Process CPU.
  4. The Node process RAM footprint is unaffected.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Scaling the memory budget has significant impact on block adoption time only.
  2. Scaling by factor 1.5 leads to a 14ms (or 25%) increase, whereas factor 2 leads to 28ms (49%).

Peer propagation

  1. Same as on the block producer, scaling the memory budget has significant impact on block adoption times only.
  2. Scaling by factor 1.5 leads to a 15ms (or 26%) increase, whereas factor 2 leads to 28ms (48%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

  1. 1.5 x the memory budget results in a slight increase of 19ms - 22ms in cluster adoption times (4% - 5%).
  2. 2 x the memory budget results in a moderate 27ms - 34ms increase (5% - 7%, with 9% in the 50th centile).

Conclusion

These measurements outline the headroom for raising the memory budget, along with the expected performance impact:

  1. Block adoption time is the only metric that's affected significantly, increasing both on the forger and the peers by the same extent.
  2. These increases seem to correspond linearly with the raising the memory budget. This gives excellent predictability of performance impact.
  3. Expectedly, more allocations happen; we can observe the same linear correspondence here as well.
  4. It has to be pointed out that block diffusion is only slightly affected by changing the execution budget: Due to pipelining, announcing and (re-)sending a block precedes adoption in most cases.
  5. As such, regarding absolute cluster adoption times, measurements taken with either budget adjustment do not exhibit performance risks to the network. They do illustrate, however, the performance cost of those budget adjustments.

Attachment

Full report PDF downloadable here.

· 3 min read
Michael Karg

Setup

This report compares benchmarking runs for 2 different flavours of cardano-node:

  • 10.2-regular - regular Node performance baseline from the 10.2.x release benchmarks.
  • 10.2-utxohd - the UTxO-HD build of the Node based on that same version.

For this benchmark, we're gathering various metrics under the value-only workload used in release benchmarks: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively. Moreover, it's the workload that produces most stress on the UTxO set. Thus, it's the most meaningful workload when it comes to benchmarking UTxO-HD.

We target the in-memory backing store of UTxO-HD - LedgerDB V2 in this case. The on-disk backend is not used.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology.

Observations

Resource Usage

  1. With UTxO-HD's in-memory backend, the memory footprint increases slightly by 3%.
  2. Process CPU usage is moderately reduced by 9% with UTxO-HD.
  3. Additionally, CPU 85% spans decrease in duration by 24% (~1.1 slots).

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Block context acquisition improves by 3ms (or 11%), while Ledger ticking takes 3ms (or 10%) longer.
  2. Creating a mempool snapshot is significantly faster - by 16ms (or 21%).
  3. As a result, a UTxO-HD block producing node is able to announce a new header 17ms (or 12%) earlier into a slot.
  4. Additionally, adoption time on the forger is slightly improved - by 4ms (or 5%).

Peer propagation

  1. Block fetch duration increases moderately by 13ms or 4%.
  2. Adoption times on the peers improve very slightly - by 2ms or 2%.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

  1. There is no significant difference in cluster adoption times between regular and UTxO-HD node.

Conclusion

Regarding the UTxO-HD build using the in-memory LedgerDB V2 backend, we can conclude that:

  1. it is lighter on CPU usage compared to the regular node, albeit requiring just slightly more RAM.
  2. it poses no performance risk to block producers; on the contrary, the changes in forging loop metrics seem favourable compared to the regular node.
  3. network performance would be expeceted to be on par with the regular node.
  4. even under stress, there is no measurable performance regression compared to the regular node.
  5. as a consequence of the above, performance-wise, it's a viable replacement for the regular in-memory solution.

Attachment

Full report for value-only workload, PDF downloadable here.

· 3 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

  • 10.1.4 - baseline from a previous mainnet release
  • 10.2.1 - the current release

For this benchmark, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. CPU usage increases moderately by 12% under value, and very slightly by 2% under Plutus workload.
  2. CPU 85% spans increase by 14% (~0.6 slots) under value workload, but decrease by 6% (~0.8 slots) under Plutus workload.
  3. Only under value workload, we observe a slight increase in Allocation rate and Minor GCs of 9% and 8%

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Adoption time on the forger improves by 3ms (or 4%) - and 5ms (or 9%) under Plutus workload.
  2. Block context acquisition takes 3ms (or 12%) longer under value workload.
  3. Under Plutus workload only, ledger ticking improves by 3ms (or 12%).

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. Block fetch duration improves clearly by 16ms (or 4%) under value-only workload.
  2. Under Plutus workload, we can measure an improvement by 4ms (or 7%) for adoption times on the peers.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

As a result of the above, on 10.2.1 exhibits:

  1. a slight 3% improvement in cluster adoption times in the 80th centile and above under value workload.
  2. a near-jitter 1% - 3% improvement in cluster adoption times under Plutus workload.

Conclusion

  1. We could not detect any significant regressions, or performance risks, on 10.2.1.
  2. 10.2.1 comes with slightly increased CPU usage, and no changes to RAM footprint.
  3. Diffusion metrics very slightly improve - mainly due to block fetch being more efficient for full blocks, and adoption for blocks exclusively containing Plutus transactions.
  4. This points to network performance of 10.2.1 being on par with or very slightly better than 10.1.4.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.2.1 were performed on tag 10.2.0. The patch version bump did not include changes relevant to performance; thus, measurements and observations performed on 10.2.0 remain valid.

· 3 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

  • 10.1.1 - baseline from a previous mainnet release
  • 10.1.4 - the current mainnet release

For this benchmark, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. CPU 85% spans slightly increase by 6% or ~0.2 slots (26% or ~2.9 slots under Plutus workload).
  2. We can observe a tiny increase in memory usage by 1-2% (132-160 MiB).

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Under value workload, Ledger Ticking and Self Adoption exhibit a very slight increase (2ms each).
  2. Block Context Acquisition has improved by 2ms.
  3. Under Plutus workload, there are no significant changes to forger metrics.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. There's a minor increase of 1% (3ms) in Block Fetch duration under value workload only.
  2. Under Plutus workload, we can measure a small improvement by 2% for adoption times on the peers.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

As a result of the above, on 10.1.4 we can observe:

  1. a tiny increase in cluster adoption times of 1%-2% in the 80th centile and above under value workload.
  2. an improvement in cluster adoption times of 3%-4% in the tail end (95th centile and above) under Plutus workload.

Conclusion

  1. For 10.1.4, we could not detect any regressions or performance risks.
  2. All increases or decreases in forger and peer metrics are 3ms and less. This indicates network performance of 10.1.4 will very closely match that of 10.1.1 and subsequent patch releases.
  3. There's no significant change in the resource usage pattern. The increased CPU 85% spans tend to barely manifest when the system is under heavy load (value workload); as such, they pose no cause for concern.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The benchmarks for 10.1.1 were performed on tag 10.0.0-pre. The minor version bump did not include changes relevant to performance; thus, measurements taken on 10.0.0-pre remain a valid baseline.

· 4 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

  • 9.2.0 - baseline from a previous mainnet release
  • 10.1.1 - the current mainnet release

For this benchmark, we're gathering various metrics under 3 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.
  3. value+voting: On top of above value workload, this one has DReps vote on and ratify governance actions - forcing additional computation for vote tallying and proposal enactment.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Conway era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. 10.1.1 shows an improvement of 4% (8% under Plutus workload) in Process CPU usage.
  2. Allocation Rate improves by 8% (11% under Plutus workload), while Heap Size remains unchanged.
  3. CPU 85% spans decrease by 18% (5% under Plutus workload).
  4. Compared to value-only workload, ongoing voting leads to a slight increase of 5% in Process CPU usage.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Under Plutus workload, 10.1.1 exhibits a formidable speedup of 70ms in the forging loop - due to mempool snapshots being produced much more quickly.
  2. Under value workload, there are no significant changes to forger metrics.
  3. With voting added on top of the value workload, we can observe mempool snapshots and adoption time on the block producer rise by 10ms each.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. Block Fetch duration increases slightly by 16ms (or 5%) under value workload.
  2. Under Plutus workload, there are no significant changes to peer-related metrics.
  3. With the additional voting workload, peer adoption times rise by 12ms on average - confirming the observation for adoption time on the block producer.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

  1. 10.1.1 exhibits a slight increase of 2% - 3% in cluster adoption times under value workload.
  2. Under Plutus workload however, we observe significant improvement of 18% up to the 50th centile, and 9% - 13% in the 80th centile and above.
  3. While the former is due to slightly increased Block Fetch duration, the latter is the consequence of much quicker mempool snapshots involving Plutus transactions.
  4. Submitting the additional voting workload, we can observe a consistent 4% - 6% increase in cluster adoption times across all centiles.

Conclusion

  • We do not detect any perfomance regression in 10.1.1 compared to 9.2.0.
  • To the contrary - 10.1.1 is lighter on the Node process resource usage overall.
  • Improved forging and diffusion timings can be expected for blocks heavy on Plutus transactions.
  • Stressing the governance / voting capabalities of the Conway ledger lets us ascertain an (expected) performance cost of voting.
  • This cost has demonstrated to be reasonable, and to not contain lurking perfomance risks to the system.
  • It is expected to manifest only during periods of heavy vote tallying / proposal enactment, slightly affecting block adoption times.

NB. The same amount of DReps are registered for each workload. However, only under value+voting do they become active by submitting votes. This requires an increased UTxO set size, so it uses a baseline seperate from value-only, resulting in slightly different absolute values.

Contact

As for publishing such benchmarking results, we are aware that more context and detail may be needed with regard to specfic metrics or benchmarking methodology.

We are still looking to gather questions, both general and specific, so that we can provide a suitable FAQ and possibly improve presentation in the future.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

Full report for value+voting workload, PDF downloadable here.

NB. The release benchmarks for 10.1.1 were performed on tag 10.0.0-pre. The minor version bump did not include changes relevant to performance; thus, measurements taken on 10.0.0-pre remain valid.

· 3 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 3 different versions of cardano-node:

  • 8.7.2 - baseline for previous mainnet release
  • 8.8.0 - an intermediate reference point
  • 8.9.0 - the next mainnet release

For each version, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Babbage era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

The observations stated refer to the direct comparison between the 8.7.2 and 8.9.0 versions.

Resource Usage

  1. Overall CPU usage exhibits a small to moderate (5% - 8%) increase.
  2. Memory usage is very slightly decreased by 1%.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. For full blocks, Mempool Snapshotting improves by 4% (or 3ms).
  2. For small blocks, Self Adoption times improve by 8% (or 4ms).
  3. All other forger metrics do not exhibit significant change.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. For full blocks, Block Fetch duration shows a notable improvement by 10ms (or 3%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

End-to-end propagation times on 8.9.0 exhibit a small improvement by 2% across all centiles for full blocks, whereas they remain largely unchanged for small blocks.

Conclusion

  • The performance changes observed between 8.9.0 and 8.7.2 are only minor - with 8.9.0 slightly improving on 8.7.2. Therefore, we'd expect 8.9.0 Mainnet performance to be akin to 8.7.2.
  • We have demonstrated no performance regression has been introduced in 8.9.0.

Contact

As for publishing such benchmarking results, we are aware that more context and detail may be needed with regard to specfic metrics or benchmarking methodology.

We are still looking to gather questions, both general and specific, so that we can provide a suitable FAQ and possibly improve presentation in the future.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. Mainnet release 8.7.3 did not include any performance-related changes; measurements taken on 8.7.2 remain valid.

· 3 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

  • 8.9.0 - baseline for previous mainnet release
  • 8.9.1 - the next mainnet release

For each version, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Babbage era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. We can observe an overall decrease in CPU usage (2% - 4%); only GC CPU usage under value workload increases by 3%.
  2. Under value workload only, Allocation rate is very slightly decreased (1%) with no change to Heap Size.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. Mempool Snapshot duration increases slightly by 2ms under value workload.
  2. Self-Adoption time increases by 3ms.
  3. All other forger metrics do not exhibit significant change.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. Under value workload only, Block Fetch duration and Fetched to Sending show a slight increase of 2ms each.

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

End-to-end propagation times on 8.9.1 exhibit a small increase by 1% - 2% for full blocks, while remaining virtually unchanged for small blocks.

Conclusion

  • The performance changes measured between 8.9.1 and 8.9.0 are very minor. Mainnnet performance of 8.9.1 is expected to be akin to 8.9.0.
  • We have not observed any performance regression being introduced in 8.9.1.

Contact

As for publishing such benchmarking results, we are aware that more context and detail may be needed with regard to specfic metrics or benchmarking methodology.

We are still looking to gather questions, both general and specific, so that we can provide a suitable FAQ and possibly improve presentation in the future.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

· 3 min read
Michael Karg

Setup

As part of the release benchmarking cycle, we're comparing benchmarking runs for 2 different versions of cardano-node:

  • 8.9.1 - baseline from a previous mainnet release
  • 8.9.3 - the current mainnet release

For each version, we're gathering various metrics under 2 different workloads:

  1. value-only: Each transaction consumes 2 inputs and creates 2 outputs, changing the UTxO set. This workload produces full blocks (> 80kB) exclusively.
  2. Plutus: Each transaction contains a Plutus script exhausting the per-tx execution budget. This workload produces small blocks (< 3kB) exclusively.

Benchmarking is performed on a cluster of 52 block producing nodes spread across 3 different AWS regions, interconnected using a static, restricted topology. All runs were performed in the Babbage era.

Observations

These benchmarks are about evaluating specific corner cases in a constrained environment that allows for reliable reproduction of results; they're not trying to directly recreate the operational conditions on Mainnet.

Resource Usage

  1. Under value workload, CPU usage increases slightly on 8.9.3: 4% for Process, 3% for Mutator and 8% for GC.
  2. Additionally, Allocation rate and minor GCs increase slightly by 3% each.
  3. Under Plutus workload only, the GC live dataset increases by 10% or 318MB.
  4. CPU 85% spans increase by 14% of slot duration under value workload, whereas they shorten by 5% of slot duration under Plutus workload.

Caveat: Individual metrics can't be evaluated in isolate; the resource usage profile as a whole provides insight into the system's performance and responsiveness.

Forging Loop

  1. There are no significant changes to metrics related to block forging.

The metric 'Slot start to announced' (see in attachments) is cumulative, and demonstrates how far into a slot the block producing node first announces the new header.

Peer propagation

  1. Block Fetch duration improves by 7ms (or 2%) under value workload, and by 4ms (or 3%) under Plutus workload.
  2. Under Plutus workload, Fetched to sending improves by 2ms (or 5%).

End-to-end propagation

This metric encompasses block diffusion and adoption across specific percentages of the benchmarking cluster, with 0.80 adoption meaning adoption on 80% of all cluster nodes.

  1. Under value workload, cluster adoption times exhibit a minor improvement (1%) up to the 80th centile on 8.9.3.
  2. Under Plutus workload, we can observe a minor improvement overall (1% - 2%), whilst full adoption is unchanged.

Conclusion

  • The performance changes measured between 8.9.3 and 8.9.1 are very minor, with 8.9.3 improving slightly over 8.9.1.
  • Mainnnet performance of 8.9.3 is expected to be akin to 8.9.1.
  • We have not observed any performance regression being introduced in 8.9.3.

Contact

As for publishing such benchmarking results, we are aware that more context and detail may be needed with regard to specfic metrics or benchmarking methodology.

We are still looking to gather questions, both general and specific, so that we can provide a suitable FAQ and possibly improve presentation in the future.

Attachments

Full report for value-only workload, PDF downloadable here.

Full report for Plutus workload, PDF downloadable here.

NB. The baseline for 8.9.1 had to be re-established due to changes in the underlying network infrastructure. This means, absolute values may differ from the previous measurements taken from that version.