Skip to main content

45 posts tagged with "performance-tracing"

View All Tags

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.9.1 have been performed and analysed.
  • Development: We've implemented a benchmarking setup for UTxO-HD's LMDB (on-disk) backend.
  • Workbench: The now modular, nix-based genesis creation has been merged to master; DRep delegation and integration of a new cardano-cli command are ongoing.
  • Tracing: Benchmarking the new handle registry feature in cardano-tracer is complete; quality-of-life improvements to Prometheus output.
  • UTxO Growth: We've adjusted our framework to support running UTxO scaling benchmarks on both a single node and a cluster.
  • Nomad cluster: new multi-cluster support with the capability to quickly adjust to changes in deployed hardware

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.1. Comparing with release 8.9.0, we could not detect any performance risks for that version.

Development

In context of UTxO scaling, we want to assess the feasability of the current on-disk solution (which is LMDB) of a UTxO-HD enabled node. Using that, the UTxO set will be kept in live tables and snapshots on disk, significantly reducing memory requirements.

We've implemented a benchmark setting, and a node service configuration, supporting direct disk access to a dedicated device which can be initialized with optimized file system and mount settings. It's purpose is to serve as storage for the highly performance-critical disk I/O of the LMDB component.

Workbench

Our automation for creating all flavours of geneses has seen cleanup and refactoring - which has been merged to master. It can now use a more principled, and rigorously checked, modular approach to define, create and cache the desired genesis files.

Working on integrating new cardano-cli functionality in our automation is ongoing. The performance workbench will support a different, and updated, CLI command which will allow injection of DRep delegations into genesis.

Tracing

Benchmarking cardano-tracer's new handle registry feature has been performed and evaluated. We're satisfied with seeing clear performance improvements along with cleaner code, and much better test coverage. Especially allocation rate and number of garbage collections (GC) could be significantly reduced, along with the CPU time required for performing GCs. This will allow for higher trace message throughput given identiacal system resources - plus less system calls issued to the OS in the process.

Furthermore, the new tracing system is getting improvements for its Prometheus output - like providing version numbers as metrics, or annotating metrics with their type - enhancing the output's overall utility.

UTxO Growth

The performance workbench now supports profiles aimed at simulating UTxO growth both for a single node and an entire cluster. Additionally, simulating different RAM sizes in combination with specific UTxO set sizes is supported. For a single block producing node, the focus is on quick turnaround when running a benchmark, gaining insight into the node's RAM usage and possible impact on the forging loop.

The cluster profiles enable capturing block diffusion metrics as well, however they require a much longer runtime. We can now successfully benchmark the node's behaviour when dealing with UTxO set sizes 4x - 5x of current mainnet, as well as a possible change in behaviour when operating close to phsyical RAM limit due to that.

Nomad cluster

Our backend now supports allocating and deploying Nomad jobs for multiple clusters simultaneously - all while keeping existing automations operational. We've taken special precautions a cluster, as seen by the backend, can be efficiently and easily modified to reflect newly deployed, or changed, hardware. Additionally, we've added support for host volumes inside a Nomad allocation - which will be needed for benchmarking UTxO-HD's on-disk solution.

· 5 min read
Michael Karg

High level summary

  • Benchmarking: We've performed release benchmarks for Node 8.9.0. Additionally, we benchmarked different GC settings for cardano-node.
  • Development: Ongoing work on the reporting pipeline and high-level profile definitions.
  • Workbench: In conjunction with DRep delegations in genesis, we're working on adjustments to a new cardano-cli command.
  • Tracing: Test coverage for the new handle registry feature in cardano-tracer is complete.
  • UTxO Growth: Currently, we're developing a series of benchmarks targeting performance implications of increased UTxO set size.
  • Nomad cluster: Disk storage safety net; better admin access to Nomad nodes; basic backend support for more than 1 cluster; new latency service.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.0. Initially, we identified a performance regression in connection to ledger snapshots. This has been addressed very swiftly. Having re-run the fixed version, we could detect no performance risks in comparison with 8.7.2 / 8.7.3.

In an additional set of benchmarks, we targeted the garbage collector (GC) settings that cardano-node is built with by default. Specifically, we compared these (copying, sequential GC) as a baseline to using the parallel GC, the compacting GC and the non-moving GC - all of which are supported by GHC's runtime system. As GC always is a trade-off between space and time (and as a consequence, responsiveness); we could measure the parallel GC offering a slight increase in responsiveness at the cost of delaying some evaluations - which is suboptimal for a block forger. The compacting GC could clearly achieve a smaller RAM footprint, but only at the cost of increased CPU usage - and clearly worsened responsiveness. The non-moving GC could greatly enhance responsiveness - but increased the RAM footprint tremendously, as well as introduced delays in the forging loop. In conclusion: the existing default is still the best choice by far for cardano-node - validated both on GHC8.10.7 and GHC9.6.3.

Development

The work on moving benchmark profile, and genesis, definitions out of the bash scripting / JSON data transformation space is still ongoing. Type safety and a test suite for those tasks will allow for a much more principled approach.

The implementation of additional rendering formats and report templates for our reporting pipeline has been completed; it is currently in testing and validation phase.

Workbench

We're working on integrating new cardano-cli functionality in our automations. Injecting DRep delegations into genesis - for Conway ledger benchmarks - will require us to use a new CLI command, which differs in in output structure and options provided from the one we're using to inject stake pool delegations. This requires us to implement and additional post-processing step for backends to find everything as expected.

Furthermore, a PR has been merged which refactors and cleans up benchmarking profiles, with a focus on fine-tuning solo-node benchmarks which scrutinize a single cardano-node process.

Tracing

The test suite for cardano-tracer's new handle registry feature is complete, and the new feature passes all tests. At the moment, we're preparing it for merging into master.

UTxO Growth

We're developing a series of benchmarks that will provide insight into possible changes to the Node's performance characteristics given different UTxO set sizes and numbers of delegated wallets. What we aim to capture in these benchmarks is the system's capability to scale with UTxO growth - while simultaneously evaluating hardware requirements. The workloads will be based on existing release benchmarks, but allow for flexibility regarding UTxO set and delegations. They will target the existing in-memory solution, and at the same time permit feasability testing UTxO-HD's on-disk flavour - which does not keep the entire UTxO set in RAM permanently.

Nomad cluster

Implementation of cluster machine disk space checking and garbage collection is complete. A requirement was that no monitoring process interferes with a running benchmark, so a non-synchronous approach was chosen to guarantee enough disk space. This prevents failing runs, and thus the necessity to repeat them.

In the process, the workbench backend was equipped with a wider range of cluster commands and abstractions, which makes administrating cluster machines more flexible. This includes a new service to create a network latency matrix for deployed cluster hardware - generalizing the approach chosen during the Nomad cluster's initial validation phase. This can guarantee the validity of existing baselines in case of hardware reboots, or changes in topology.

Last not least, the backend is currently receiving an additional feature: supporting more than 1 hardware cluster. This will enable us, in the future, to benchmark on ephemeral clusters - without interfering with the hard requirements, or the schedule, of release benchmarking on our default deployment. The motivation is being able to benchmark different hardware configurations, along with varying cardano-node options and initial ledger states on a parallel schedule - also, without having to keep those clusters running at all times.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.8.0 have been performed; we created a local repro for a residual issue.
  • Performance: We've implemented and benchmarked two candidates investigating residual issues with GHC9.6.
  • Development: Work on the reporting pipeline is ongoing; integration of DReps into benchmarking workloads has begun.
  • Workbench: Implementation of high-level profile definition is ongoing.
  • Tracing: The handle registry feature for cardano-tracer is completed; currently in testing.
  • Nomad cluster: Increased robustness of deployment and run monitoring has been merged; work on garbage collection has started.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.8.0-pre. Comparing with release 8.7.2, we could not detect any performance risks for that version. We even saw a slight improvement in block fetch related metrics, which led to slightly improved block diffusion times.

Furthermore, we've managed to boil down a complex residual performance issue measured on the cluster to a local reproduction. This enables our DevX team, with highly specialized knowledge of GHC's compiler internals, to investigate each step in code generation and optimzation, and independently observe the effects of code changes to the affected component.

Performance

Work on the remaining performance issue with GHC9.6 led us to produce two candidates based on Node 8.7.2, benchmarking the implacations local small changes have for GHC9.6's optimizer. Though those candidates did not uncover the issue's root cause, they were able to disprove a hypothesis as to its nature, and quantify the performance impact of said small changes.

Development

Node 8.8.0 comes with capabilities to inject DReps and DRep delegations into Conway genesis. We've started work on integrating those into our automations, and setting sensible values for benchmarking. The aforementioned delegations representing a new data structure in the Conway ledger, we aim to run existing workflows extended with varying sizes of that new structure, measuring their pressure on ledger queries and operations.

Workbench

The performance workbench relies heavily on shell scripting and manipulating JSON data for a great part of its features. This approach is very effective for quick experimentation, but lacks in verifiable properties as well as accessibility for new users of workbench.

After the successful Haskell port of cluster topology creation, and verification, we're currently applying the same model in porting the entirety of benchmarking profiles to Haskell. The obvious gains are widening workbench's audience both for users and developers, as well as implementing a principled approach to all workbench data structures and transformations.

At the same time, we're porting workbench's many options to create fine-tuned geneses, following the same approach.

Tracing

We've outfitted cardano-tracer with a handle registry feature that lets the service work on file handles internally, rather than opening and closing files for each operation. The feature is completed; at the moment we're adding appropriate test cases to the service's test suite for validation of its behaviour, and for safeguarding future development.

Nomad backend

Several improvements for our cluster backend have been merged to master, increasing its overall robustness. We can now safely handle some corner cases where Nomad processes unexpectedly exited, or deployments errored out. Furthermore, an ongoing run can now reliably survive a temporary loss of heartbeat connection between Nomad client and server, without the benchmarking metrics being affected.

Currently, we're working on a reliable automation of garbage collecting old nix store entries on the cluster machines, as they fill up disk space. The design has to consider both not interfering with ongoing benchmarks, and avoiding deployment overhead caused by cleaning the store too frequently.

· 2 min read
Michael Karg

High level summary

  • Benchmarking: GHC 9.6.3 benchmarks for node 8.7.2 have been performed.
  • Development: Additional features for our reporting pipeline, while simultaneously reducing dependency footprint.
  • Tracing: Implementation for cardano-tracer to work on handles instead of files; work on New Tracing Quickstart document has begun.
  • Nomad cluster: We're preparing an upgrade to the latest Nomad version.

Low level overview

Benchmarking

We've performed a full set of GHC 9.6.3 benchmarks for node 8.7.2. For recommending GHC9.6 as a default build platform for cardano-node - from a performance perspective - we observe only one residual issue. As a way to address this, we've decided to create a reproduction benchmark targeting the affected component.

Development

Our reporting pipeline will be expanded to support a wider range of rendering formats, as well as report templates. As the pipeline is part of our workbench - and thus gets downloaded and built when entering the workbench shell - it's good practice to keep a small dependency footprint. When reworking the pipeline, we aim to simultaneously reduce dependencies.

Tracing

So far, cardano-tracer has internally been using files, or file names, for the purpose of logging trace messages it receives via forwarding. This is simple, but induces quite some overhead at runtime: files have to be opened and closed for each message. Using and managing open file handles inside cardano-tracer does remove that overhead, but unsurprisingly introduces some complexity into the application code. Currently, we're working on implementing that change.

Furthermore, we're working on a Quickstart document for the new tracing system, with end users as its intended audience. It will contain recommended production use setup(s), and how to efficiently configure and run them step by step. Additionally, it will provide a brief, but comprehensive overview over the features at the user's disposal.

Nomad backend

On the Nomad cluster, we've experienced undesired system behaviour when the heartbeat between the Nomad server and a client is interrupted temporarily - although the Nomad job itself is still 100% functional. A Nomad upgrade to the latest version promises to fix that, but it turn comes with other issues. We're currently working on adapting our automation and deployment around those known issues, before we can eventually apply the upgrade.

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarking for node 8.7.2, as well as P2P benchmarks - the latter being a premiere for the Nomad cluster.
  • Development: Work on automating the submission of Conway transactions for registration of and stake delegation to DReps is ongoing.
  • Infrastructure: We're preparing to safely retire our current benchmarking cluster.
  • Tracing: The documentation for the new tracing system is being reworked. Additionally, we've added a feature to cardano-tracer to capture performance over a long runtime.
  • Nomad cluster: The Nomad backend has been successfully equipped with support for up-to-date P2P topology, as well as deployment for Plutus script data.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for node 8.7.2. From a performance perspective, it has been greenlit for mainnet release. Starting with this version, our team will publish observations alongside the original comparative analysis of benchmarks, providing insight into key metrics and resource usage. Hence, for the post on version 8.7.2, see here.

Additionally, we're running P2P versus no-P2P benchmarks on the same version. We intend to establish future baseline runs using P2P topology as the default setting. All our cluster nodes being block producers, it is crucial to establish the P2P stack does not exhibit any regression regarding block forging. Furthermore, the evidence gathered from those benchmarks forms the base for a recommended setting for P2P on mainnet block producers.

It deserves special mention that those P2P benchmarks are our first production runs with the Nomad cluster - and using the new tracing system exclusively. Having finalized all optimization rounds of the latter, and having meticulously eliminated all confounding factors from the Nomad infrastructure, we're confident in the measurements being subject to extremely low variance - which we made sure of in many past validation runs on Nomad.

Development

Orchestrating DRep actions into benchmarking workflows has opened up a fairly large design space. Currently, we're focusing on having stake delegated to DReps, in order to trigger ledger pulses for calculations particular to DRep actions. We can benchmark a possible performance impact of those pulses - even if there are no actions ongoing - as a first step.

The contributing factors to the cost of those pulses are both the number of DReps registered, and the number of delegations to them. It is still under debate which values represent a probable model for mainnet, and whether we can achieve stake delegation programmatically (i.e. by submitting transactions), or if, for large numbers, we need means to inject those delegations into genesis.

Infrastructure

With the switch of all production benchmarks to the Nomad cluster, we will retire the legacy cardano-ops cluster very shortly. Currently, we're making sure that when all its resources are released, we keep an archive for all runs performed, including all raw log data - with the oldest runs dating back as far as December 2019.

Tracing

We've outfitted the cardano-tracer service with the same kind of resource tracing machinery that's used by cardano-node - as well as created a dedicated benchmarking profile for it. It puts very little pressure on the node; instead it causes 6 nodes to emit traces at a rate of >35 messages per second, putting pressure on cardano-tracer via trace forwarding at >200 messages per second for an extended period of time. Analysis of these traces will form the ground for various optimizations of cardano-tracer that are currently being worked on.

As we aim for early 2024 to be able to recommend the new tracing system as default in production use, we're currently also reworking the comprehensive documentation to reflect all changes made over the last months.

Nomad backend

In addition to being able to deploy Plutus script data and redeemers seperately (instead of inlining them as the legacy cluster did), the Nomad cluster now supports being set up with recent P2P topology. No-P2P topology format will still be supported for occasional regression benchmarking of the P2P stack, when desired. Furthermore, we've completed porting all benchmarking profiles from the legacy cluster to Nomad.