Skip to main content

51 posts tagged with "performance-tracing"

View All Tags

· 3 min read
Michael Karg

High level summary

  • Benchmarking: We've performed benchmarks and analyses for Node versions 8.9.2 and 8.10.0.
  • Development: Design phase for implementing quick queries in the analysis pipeline has begun.
  • Workbench: We're finishing up the new features for the reporting pipeline; Haskell profile definition work is ongoing.
  • Tracing: Improving the Prometheus output is ongoing; the node's build info will be accessible as a Prometheus label.
  • UTxO Growth: Our tooling has been augmented to support benchmarks starting with a non-empty chain.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.2. Comparing with release 8.9.1, we could not detect any performance risks for that version.

The benchmarks for 8.10.0 have shown a slight improvement in the time the block forging loop needs to evaluate, whilst additionally, resource usage of the cardano-node process was also slightly reduced - a nice performance improvement.

Development

Our analysis pipeline is based on batch analysis of data from over 50 cluster nodes; it consumes very large amounts of trace output ex post facto, when the actual benchmark has terminated. This is very time-intensive, and not viable for obeserving an additional metric that you later on determine might need consideration.

We're planning to add quick queries into a benchmarking run's trace data to our analysis pipeline. These will be structured such that parameterizable, ad-hoc querying is supported. Initial tests showed that evaluation speed of such queries is fast enough to merit designing a principled, and generalized, syntax for them - and a subsequent implementation.

Workbench

The reporting pipeline has been augmented with direct support for customizable, and stylable, TeX rendering - currently receiving final touches.

Porting our performance workbench's profile definitions to Haskell, and providing them with an appropriate test suite, is ongoing work. It is our goal to both increase reliable profile definition and validation, and facilitate usage by engineers less familiar with the workbench.

Tracing

The work to improve system metrics as presented to Prometheus is still ongoing. Type annotations, as well as introducing Prometheus labels for certain metrics to convey (like e.g. build information), will make that interface more versatile. It also facilitates configuration of monitoring or dashboards like Grafana on top of those Prometheus metrics.

UTxO Growth

For the UTxO scaling benchmarks, we've augmented the workbench with the capability to support injection of a custom synthesized chain into the deployment, and start a benchmark only after replaying that chain - whereas our benchmarks usually start just with a genesis block.

To achieve that, different components of our tooling needed addition of features: distributing that chain to the node cluster, having analysis work without necessarily providing trace evidence of each block in the chain being forged by a benchmarking node. Cluster timing had to be adjusted to account for the gap between genesis start time and the chain tip. However, this entire mechanism opens up the possibility of having a very distinct ledger state at hand for a benchmark - one, that's been particularly crafted via a series of pre-defined transactions constituting the blocks during creation of the synthesized chain.

In the future, we plan to flesh out a more general design of that mechanism, which currently is tied to a very specific use case only.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.9.1 have been performed and analysed.
  • Development: We've implemented a benchmarking setup for UTxO-HD's LMDB (on-disk) backend.
  • Workbench: The now modular, nix-based genesis creation has been merged to master; DRep delegation and integration of a new cardano-cli command are ongoing.
  • Tracing: Benchmarking the new handle registry feature in cardano-tracer is complete; quality-of-life improvements to Prometheus output.
  • UTxO Growth: We've adjusted our framework to support running UTxO scaling benchmarks on both a single node and a cluster.
  • Nomad cluster: new multi-cluster support with the capability to quickly adjust to changes in deployed hardware

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.1. Comparing with release 8.9.0, we could not detect any performance risks for that version.

Development

In context of UTxO scaling, we want to assess the feasability of the current on-disk solution (which is LMDB) of a UTxO-HD enabled node. Using that, the UTxO set will be kept in live tables and snapshots on disk, significantly reducing memory requirements.

We've implemented a benchmark setting, and a node service configuration, supporting direct disk access to a dedicated device which can be initialized with optimized file system and mount settings. It's purpose is to serve as storage for the highly performance-critical disk I/O of the LMDB component.

Workbench

Our automation for creating all flavours of geneses has seen cleanup and refactoring - which has been merged to master. It can now use a more principled, and rigorously checked, modular approach to define, create and cache the desired genesis files.

Working on integrating new cardano-cli functionality in our automation is ongoing. The performance workbench will support a different, and updated, CLI command which will allow injection of DRep delegations into genesis.

Tracing

Benchmarking cardano-tracer's new handle registry feature has been performed and evaluated. We're satisfied with seeing clear performance improvements along with cleaner code, and much better test coverage. Especially allocation rate and number of garbage collections (GC) could be significantly reduced, along with the CPU time required for performing GCs. This will allow for higher trace message throughput given identiacal system resources - plus less system calls issued to the OS in the process.

Furthermore, the new tracing system is getting improvements for its Prometheus output - like providing version numbers as metrics, or annotating metrics with their type - enhancing the output's overall utility.

UTxO Growth

The performance workbench now supports profiles aimed at simulating UTxO growth both for a single node and an entire cluster. Additionally, simulating different RAM sizes in combination with specific UTxO set sizes is supported. For a single block producing node, the focus is on quick turnaround when running a benchmark, gaining insight into the node's RAM usage and possible impact on the forging loop.

The cluster profiles enable capturing block diffusion metrics as well, however they require a much longer runtime. We can now successfully benchmark the node's behaviour when dealing with UTxO set sizes 4x - 5x of current mainnet, as well as a possible change in behaviour when operating close to phsyical RAM limit due to that.

Nomad cluster

Our backend now supports allocating and deploying Nomad jobs for multiple clusters simultaneously - all while keeping existing automations operational. We've taken special precautions a cluster, as seen by the backend, can be efficiently and easily modified to reflect newly deployed, or changed, hardware. Additionally, we've added support for host volumes inside a Nomad allocation - which will be needed for benchmarking UTxO-HD's on-disk solution.

· 5 min read
Michael Karg

High level summary

  • Benchmarking: We've performed release benchmarks for Node 8.9.0. Additionally, we benchmarked different GC settings for cardano-node.
  • Development: Ongoing work on the reporting pipeline and high-level profile definitions.
  • Workbench: In conjunction with DRep delegations in genesis, we're working on adjustments to a new cardano-cli command.
  • Tracing: Test coverage for the new handle registry feature in cardano-tracer is complete.
  • UTxO Growth: Currently, we're developing a series of benchmarks targeting performance implications of increased UTxO set size.
  • Nomad cluster: Disk storage safety net; better admin access to Nomad nodes; basic backend support for more than 1 cluster; new latency service.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.0. Initially, we identified a performance regression in connection to ledger snapshots. This has been addressed very swiftly. Having re-run the fixed version, we could detect no performance risks in comparison with 8.7.2 / 8.7.3.

In an additional set of benchmarks, we targeted the garbage collector (GC) settings that cardano-node is built with by default. Specifically, we compared these (copying, sequential GC) as a baseline to using the parallel GC, the compacting GC and the non-moving GC - all of which are supported by GHC's runtime system. As GC always is a trade-off between space and time (and as a consequence, responsiveness); we could measure the parallel GC offering a slight increase in responsiveness at the cost of delaying some evaluations - which is suboptimal for a block forger. The compacting GC could clearly achieve a smaller RAM footprint, but only at the cost of increased CPU usage - and clearly worsened responsiveness. The non-moving GC could greatly enhance responsiveness - but increased the RAM footprint tremendously, as well as introduced delays in the forging loop. In conclusion: the existing default is still the best choice by far for cardano-node - validated both on GHC8.10.7 and GHC9.6.3.

Development

The work on moving benchmark profile, and genesis, definitions out of the bash scripting / JSON data transformation space is still ongoing. Type safety and a test suite for those tasks will allow for a much more principled approach.

The implementation of additional rendering formats and report templates for our reporting pipeline has been completed; it is currently in testing and validation phase.

Workbench

We're working on integrating new cardano-cli functionality in our automations. Injecting DRep delegations into genesis - for Conway ledger benchmarks - will require us to use a new CLI command, which differs in in output structure and options provided from the one we're using to inject stake pool delegations. This requires us to implement and additional post-processing step for backends to find everything as expected.

Furthermore, a PR has been merged which refactors and cleans up benchmarking profiles, with a focus on fine-tuning solo-node benchmarks which scrutinize a single cardano-node process.

Tracing

The test suite for cardano-tracer's new handle registry feature is complete, and the new feature passes all tests. At the moment, we're preparing it for merging into master.

UTxO Growth

We're developing a series of benchmarks that will provide insight into possible changes to the Node's performance characteristics given different UTxO set sizes and numbers of delegated wallets. What we aim to capture in these benchmarks is the system's capability to scale with UTxO growth - while simultaneously evaluating hardware requirements. The workloads will be based on existing release benchmarks, but allow for flexibility regarding UTxO set and delegations. They will target the existing in-memory solution, and at the same time permit feasability testing UTxO-HD's on-disk flavour - which does not keep the entire UTxO set in RAM permanently.

Nomad cluster

Implementation of cluster machine disk space checking and garbage collection is complete. A requirement was that no monitoring process interferes with a running benchmark, so a non-synchronous approach was chosen to guarantee enough disk space. This prevents failing runs, and thus the necessity to repeat them.

In the process, the workbench backend was equipped with a wider range of cluster commands and abstractions, which makes administrating cluster machines more flexible. This includes a new service to create a network latency matrix for deployed cluster hardware - generalizing the approach chosen during the Nomad cluster's initial validation phase. This can guarantee the validity of existing baselines in case of hardware reboots, or changes in topology.

Last not least, the backend is currently receiving an additional feature: supporting more than 1 hardware cluster. This will enable us, in the future, to benchmark on ephemeral clusters - without interfering with the hard requirements, or the schedule, of release benchmarking on our default deployment. The motivation is being able to benchmark different hardware configurations, along with varying cardano-node options and initial ledger states on a parallel schedule - also, without having to keep those clusters running at all times.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.8.0 have been performed; we created a local repro for a residual issue.
  • Performance: We've implemented and benchmarked two candidates investigating residual issues with GHC9.6.
  • Development: Work on the reporting pipeline is ongoing; integration of DReps into benchmarking workloads has begun.
  • Workbench: Implementation of high-level profile definition is ongoing.
  • Tracing: The handle registry feature for cardano-tracer is completed; currently in testing.
  • Nomad cluster: Increased robustness of deployment and run monitoring has been merged; work on garbage collection has started.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.8.0-pre. Comparing with release 8.7.2, we could not detect any performance risks for that version. We even saw a slight improvement in block fetch related metrics, which led to slightly improved block diffusion times.

Furthermore, we've managed to boil down a complex residual performance issue measured on the cluster to a local reproduction. This enables our DevX team, with highly specialized knowledge of GHC's compiler internals, to investigate each step in code generation and optimzation, and independently observe the effects of code changes to the affected component.

Performance

Work on the remaining performance issue with GHC9.6 led us to produce two candidates based on Node 8.7.2, benchmarking the implacations local small changes have for GHC9.6's optimizer. Though those candidates did not uncover the issue's root cause, they were able to disprove a hypothesis as to its nature, and quantify the performance impact of said small changes.

Development

Node 8.8.0 comes with capabilities to inject DReps and DRep delegations into Conway genesis. We've started work on integrating those into our automations, and setting sensible values for benchmarking. The aforementioned delegations representing a new data structure in the Conway ledger, we aim to run existing workflows extended with varying sizes of that new structure, measuring their pressure on ledger queries and operations.

Workbench

The performance workbench relies heavily on shell scripting and manipulating JSON data for a great part of its features. This approach is very effective for quick experimentation, but lacks in verifiable properties as well as accessibility for new users of workbench.

After the successful Haskell port of cluster topology creation, and verification, we're currently applying the same model in porting the entirety of benchmarking profiles to Haskell. The obvious gains are widening workbench's audience both for users and developers, as well as implementing a principled approach to all workbench data structures and transformations.

At the same time, we're porting workbench's many options to create fine-tuned geneses, following the same approach.

Tracing

We've outfitted cardano-tracer with a handle registry feature that lets the service work on file handles internally, rather than opening and closing files for each operation. The feature is completed; at the moment we're adding appropriate test cases to the service's test suite for validation of its behaviour, and for safeguarding future development.

Nomad backend

Several improvements for our cluster backend have been merged to master, increasing its overall robustness. We can now safely handle some corner cases where Nomad processes unexpectedly exited, or deployments errored out. Furthermore, an ongoing run can now reliably survive a temporary loss of heartbeat connection between Nomad client and server, without the benchmarking metrics being affected.

Currently, we're working on a reliable automation of garbage collecting old nix store entries on the cluster machines, as they fill up disk space. The design has to consider both not interfering with ongoing benchmarks, and avoiding deployment overhead caused by cleaning the store too frequently.

· 2 min read
Michael Karg

High level summary

  • Benchmarking: GHC 9.6.3 benchmarks for node 8.7.2 have been performed.
  • Development: Additional features for our reporting pipeline, while simultaneously reducing dependency footprint.
  • Tracing: Implementation for cardano-tracer to work on handles instead of files; work on New Tracing Quickstart document has begun.
  • Nomad cluster: We're preparing an upgrade to the latest Nomad version.

Low level overview

Benchmarking

We've performed a full set of GHC 9.6.3 benchmarks for node 8.7.2. For recommending GHC9.6 as a default build platform for cardano-node - from a performance perspective - we observe only one residual issue. As a way to address this, we've decided to create a reproduction benchmark targeting the affected component.

Development

Our reporting pipeline will be expanded to support a wider range of rendering formats, as well as report templates. As the pipeline is part of our workbench - and thus gets downloaded and built when entering the workbench shell - it's good practice to keep a small dependency footprint. When reworking the pipeline, we aim to simultaneously reduce dependencies.

Tracing

So far, cardano-tracer has internally been using files, or file names, for the purpose of logging trace messages it receives via forwarding. This is simple, but induces quite some overhead at runtime: files have to be opened and closed for each message. Using and managing open file handles inside cardano-tracer does remove that overhead, but unsurprisingly introduces some complexity into the application code. Currently, we're working on implementing that change.

Furthermore, we're working on a Quickstart document for the new tracing system, with end users as its intended audience. It will contain recommended production use setup(s), and how to efficiently configure and run them step by step. Additionally, it will provide a brief, but comprehensive overview over the features at the user's disposal.

Nomad backend

On the Nomad cluster, we've experienced undesired system behaviour when the heartbeat between the Nomad server and a client is interrupted temporarily - although the Nomad job itself is still 100% functional. A Nomad upgrade to the latest version promises to fix that, but it turn comes with other issues. We're currently working on adapting our automation and deployment around those known issues, before we can eventually apply the upgrade.