Skip to main content

42 posts tagged with "performance-tracing"

View All Tags

· 5 min read
Michael Karg

High level summary

  • Benchmarking: We've performed and analysed benchmarks in the Conway era, with DReps injected.
  • Development: Tracing DRep data has been implemented; improved error reporting in tx-generator and analysis quick queries are ongoing work.
  • Workbench: We now fully supports the new CLI create-testnet-data command and DRep injection into Conway genesis. Haskell profile definition work is ongoing.
  • Tracing: Various additions to Node metrics are being worked on, such as build info and block producer role. Metrics naming will be further harmonized.
  • UTxO Growth: We've finalized analysis and reports of all benchmarks targeting UTxO scaling scenarios.
  • UTxO-HD / LMDB: We've performed multiple runs benchmarking the LMDB (on-disk) backend of UTxO-HD.

Low level overview

Benchmarking

We've run and analyzed a full set of benchmarks comparing the Conway ledger against the Babbage one, on Node 8.10.1-pre. For Conway, our additional goal was to measure a vanilla ledger state against one with a large amount of DReps - and delegations to those DReps - present. The benchmarks used our existing value and Plutus workloads to remain comparable to each other.

Development

Additional ledger queries for the tracing system have been implemented and merged to master. Those capture the amount of, and the number of existing delegations to, DReps as trace output - and thus enable creating a metric on top of it, which can then be monitored.

The (in our case) non-deterministic nature of shutting down different cluster setups - both local and cloud-based - carries the possibility that our transaction generation service occasionally misclassifies a regular shutdown as an error. Furthermore, in the case of network malfunctions, the service's errors are too unspecific. By implementing thread labels for submission threads, corresponding to each submission target, and by adding custom smart signal handlers, we'll improve the generator's error reporting significantly.

The initial tests for quick queries are being developed further. We're moving towards a principled, and generalized, syntax that supports both prepared, parametrizable queries from the application code, as well as ad-hoc queries stated e.g. on the command line.

Workbench

The performance workbench now fully supports the new cardano-cli command create-test-data. We use it to inject both stake delegated to stake pools into genesis, and - recently added - stake delegated to DReps as well. It has been proven very useful and versatile so far, and will eventually replace the current create-staked command.

Work on porting our performance workbench's profile definitions to Haskell, and providing them with an appropriate test suite, is still ongoing; currently, we're integrating all new profile families that came out of the UTxO growth scenarios.

Tracing

New metrics are being implemented for the tracing system. They will also be part of Prometheus output and as such accessible to monitoring services. There'll be cardano-node's detailed build info, as well as a node's block producer status, meaning the presence of forger credentials. Those new metrics are being backported to the legacy tracing system, too.

Furthermore, we've determined the need to revisit metrics naming. There's still a divergence between naming in the legacy and the new system. While this could be mitigated by passing in extra config options, we think that a transition to the new system should not impose any unnecessary effort for node operators. A design to fully harmonize the existing naming schemata is currently being set up.

UTxO Growth

The UTxO Growth benchmarking series has been finalized. We've finished analyses and reports for all scenarios that were tested and explored.

The overarching questions were, given a network of 32GB host systems, how large can the UTxO set grow in general, how large can it grow before the nodes have to operate close to the RAM limit over extended periods of time, and how does scaling the UTxO set size affect network metrics, such as block diffusion.

A dedicated "UTxO Scaling Squad" was set up, who was driving the entire process, and we enjoyed a very focused and productive collaboration with them.

UTxO-HD / LMDB

Last not least, we were able to benchmark UTxO-HD's on-disk backend on a network of block producing nodes, on a recent 8.9.1 version of cardano-node. The setup allowed of using a direct access SSD device for performance critical disk I/O, whereas the bulk of ChainDB and ledger snapshots remained on a standard AWS EBS volume.

The benchmarks comprised both optimistic and pessimistic RAM assumptions for the host OS to further optimize I/O via page cache, as well as medium and large UTxO set sizes - the latter almost tripling current mainnet's size. The results were promising; the LMDB backend has proven to be able to accomodate large UTxO sets using significantly less RAM than the default all-in-memory node - and with a more than reasonable trade-off performance-wise. Furthermore, running with pessimistic assumptions, the performance impact on LMDB was very moderate only.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: We've performed benchmarks and analyses for Node versions 8.9.2 and 8.10.0.
  • Development: Design phase for implementing quick queries in the analysis pipeline has begun.
  • Workbench: We're finishing up the new features for the reporting pipeline; Haskell profile definition work is ongoing.
  • Tracing: Improving the Prometheus output is ongoing; the node's build info will be accessible as a Prometheus label.
  • UTxO Growth: Our tooling has been augmented to support benchmarks starting with a non-empty chain.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.2. Comparing with release 8.9.1, we could not detect any performance risks for that version.

The benchmarks for 8.10.0 have shown a slight improvement in the time the block forging loop needs to evaluate, whilst additionally, resource usage of the cardano-node process was also slightly reduced - a nice performance improvement.

Development

Our analysis pipeline is based on batch analysis of data from over 50 cluster nodes; it consumes very large amounts of trace output ex post facto, when the actual benchmark has terminated. This is very time-intensive, and not viable for obeserving an additional metric that you later on determine might need consideration.

We're planning to add quick queries into a benchmarking run's trace data to our analysis pipeline. These will be structured such that parameterizable, ad-hoc querying is supported. Initial tests showed that evaluation speed of such queries is fast enough to merit designing a principled, and generalized, syntax for them - and a subsequent implementation.

Workbench

The reporting pipeline has been augmented with direct support for customizable, and stylable, TeX rendering - currently receiving final touches.

Porting our performance workbench's profile definitions to Haskell, and providing them with an appropriate test suite, is ongoing work. It is our goal to both increase reliable profile definition and validation, and facilitate usage by engineers less familiar with the workbench.

Tracing

The work to improve system metrics as presented to Prometheus is still ongoing. Type annotations, as well as introducing Prometheus labels for certain metrics to convey (like e.g. build information), will make that interface more versatile. It also facilitates configuration of monitoring or dashboards like Grafana on top of those Prometheus metrics.

UTxO Growth

For the UTxO scaling benchmarks, we've augmented the workbench with the capability to support injection of a custom synthesized chain into the deployment, and start a benchmark only after replaying that chain - whereas our benchmarks usually start just with a genesis block.

To achieve that, different components of our tooling needed addition of features: distributing that chain to the node cluster, having analysis work without necessarily providing trace evidence of each block in the chain being forged by a benchmarking node. Cluster timing had to be adjusted to account for the gap between genesis start time and the chain tip. However, this entire mechanism opens up the possibility of having a very distinct ledger state at hand for a benchmark - one, that's been particularly crafted via a series of pre-defined transactions constituting the blocks during creation of the synthesized chain.

In the future, we plan to flesh out a more general design of that mechanism, which currently is tied to a very specific use case only.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.9.1 have been performed and analysed.
  • Development: We've implemented a benchmarking setup for UTxO-HD's LMDB (on-disk) backend.
  • Workbench: The now modular, nix-based genesis creation has been merged to master; DRep delegation and integration of a new cardano-cli command are ongoing.
  • Tracing: Benchmarking the new handle registry feature in cardano-tracer is complete; quality-of-life improvements to Prometheus output.
  • UTxO Growth: We've adjusted our framework to support running UTxO scaling benchmarks on both a single node and a cluster.
  • Nomad cluster: new multi-cluster support with the capability to quickly adjust to changes in deployed hardware

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.1. Comparing with release 8.9.0, we could not detect any performance risks for that version.

Development

In context of UTxO scaling, we want to assess the feasability of the current on-disk solution (which is LMDB) of a UTxO-HD enabled node. Using that, the UTxO set will be kept in live tables and snapshots on disk, significantly reducing memory requirements.

We've implemented a benchmark setting, and a node service configuration, supporting direct disk access to a dedicated device which can be initialized with optimized file system and mount settings. It's purpose is to serve as storage for the highly performance-critical disk I/O of the LMDB component.

Workbench

Our automation for creating all flavours of geneses has seen cleanup and refactoring - which has been merged to master. It can now use a more principled, and rigorously checked, modular approach to define, create and cache the desired genesis files.

Working on integrating new cardano-cli functionality in our automation is ongoing. The performance workbench will support a different, and updated, CLI command which will allow injection of DRep delegations into genesis.

Tracing

Benchmarking cardano-tracer's new handle registry feature has been performed and evaluated. We're satisfied with seeing clear performance improvements along with cleaner code, and much better test coverage. Especially allocation rate and number of garbage collections (GC) could be significantly reduced, along with the CPU time required for performing GCs. This will allow for higher trace message throughput given identiacal system resources - plus less system calls issued to the OS in the process.

Furthermore, the new tracing system is getting improvements for its Prometheus output - like providing version numbers as metrics, or annotating metrics with their type - enhancing the output's overall utility.

UTxO Growth

The performance workbench now supports profiles aimed at simulating UTxO growth both for a single node and an entire cluster. Additionally, simulating different RAM sizes in combination with specific UTxO set sizes is supported. For a single block producing node, the focus is on quick turnaround when running a benchmark, gaining insight into the node's RAM usage and possible impact on the forging loop.

The cluster profiles enable capturing block diffusion metrics as well, however they require a much longer runtime. We can now successfully benchmark the node's behaviour when dealing with UTxO set sizes 4x - 5x of current mainnet, as well as a possible change in behaviour when operating close to phsyical RAM limit due to that.

Nomad cluster

Our backend now supports allocating and deploying Nomad jobs for multiple clusters simultaneously - all while keeping existing automations operational. We've taken special precautions a cluster, as seen by the backend, can be efficiently and easily modified to reflect newly deployed, or changed, hardware. Additionally, we've added support for host volumes inside a Nomad allocation - which will be needed for benchmarking UTxO-HD's on-disk solution.

· 5 min read
Michael Karg

High level summary

  • Benchmarking: We've performed release benchmarks for Node 8.9.0. Additionally, we benchmarked different GC settings for cardano-node.
  • Development: Ongoing work on the reporting pipeline and high-level profile definitions.
  • Workbench: In conjunction with DRep delegations in genesis, we're working on adjustments to a new cardano-cli command.
  • Tracing: Test coverage for the new handle registry feature in cardano-tracer is complete.
  • UTxO Growth: Currently, we're developing a series of benchmarks targeting performance implications of increased UTxO set size.
  • Nomad cluster: Disk storage safety net; better admin access to Nomad nodes; basic backend support for more than 1 cluster; new latency service.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.0. Initially, we identified a performance regression in connection to ledger snapshots. This has been addressed very swiftly. Having re-run the fixed version, we could detect no performance risks in comparison with 8.7.2 / 8.7.3.

In an additional set of benchmarks, we targeted the garbage collector (GC) settings that cardano-node is built with by default. Specifically, we compared these (copying, sequential GC) as a baseline to using the parallel GC, the compacting GC and the non-moving GC - all of which are supported by GHC's runtime system. As GC always is a trade-off between space and time (and as a consequence, responsiveness); we could measure the parallel GC offering a slight increase in responsiveness at the cost of delaying some evaluations - which is suboptimal for a block forger. The compacting GC could clearly achieve a smaller RAM footprint, but only at the cost of increased CPU usage - and clearly worsened responsiveness. The non-moving GC could greatly enhance responsiveness - but increased the RAM footprint tremendously, as well as introduced delays in the forging loop. In conclusion: the existing default is still the best choice by far for cardano-node - validated both on GHC8.10.7 and GHC9.6.3.

Development

The work on moving benchmark profile, and genesis, definitions out of the bash scripting / JSON data transformation space is still ongoing. Type safety and a test suite for those tasks will allow for a much more principled approach.

The implementation of additional rendering formats and report templates for our reporting pipeline has been completed; it is currently in testing and validation phase.

Workbench

We're working on integrating new cardano-cli functionality in our automations. Injecting DRep delegations into genesis - for Conway ledger benchmarks - will require us to use a new CLI command, which differs in in output structure and options provided from the one we're using to inject stake pool delegations. This requires us to implement and additional post-processing step for backends to find everything as expected.

Furthermore, a PR has been merged which refactors and cleans up benchmarking profiles, with a focus on fine-tuning solo-node benchmarks which scrutinize a single cardano-node process.

Tracing

The test suite for cardano-tracer's new handle registry feature is complete, and the new feature passes all tests. At the moment, we're preparing it for merging into master.

UTxO Growth

We're developing a series of benchmarks that will provide insight into possible changes to the Node's performance characteristics given different UTxO set sizes and numbers of delegated wallets. What we aim to capture in these benchmarks is the system's capability to scale with UTxO growth - while simultaneously evaluating hardware requirements. The workloads will be based on existing release benchmarks, but allow for flexibility regarding UTxO set and delegations. They will target the existing in-memory solution, and at the same time permit feasability testing UTxO-HD's on-disk flavour - which does not keep the entire UTxO set in RAM permanently.

Nomad cluster

Implementation of cluster machine disk space checking and garbage collection is complete. A requirement was that no monitoring process interferes with a running benchmark, so a non-synchronous approach was chosen to guarantee enough disk space. This prevents failing runs, and thus the necessity to repeat them.

In the process, the workbench backend was equipped with a wider range of cluster commands and abstractions, which makes administrating cluster machines more flexible. This includes a new service to create a network latency matrix for deployed cluster hardware - generalizing the approach chosen during the Nomad cluster's initial validation phase. This can guarantee the validity of existing baselines in case of hardware reboots, or changes in topology.

Last not least, the backend is currently receiving an additional feature: supporting more than 1 hardware cluster. This will enable us, in the future, to benchmark on ephemeral clusters - without interfering with the hard requirements, or the schedule, of release benchmarking on our default deployment. The motivation is being able to benchmark different hardware configurations, along with varying cardano-node options and initial ledger states on a parallel schedule - also, without having to keep those clusters running at all times.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.8.0 have been performed; we created a local repro for a residual issue.
  • Performance: We've implemented and benchmarked two candidates investigating residual issues with GHC9.6.
  • Development: Work on the reporting pipeline is ongoing; integration of DReps into benchmarking workloads has begun.
  • Workbench: Implementation of high-level profile definition is ongoing.
  • Tracing: The handle registry feature for cardano-tracer is completed; currently in testing.
  • Nomad cluster: Increased robustness of deployment and run monitoring has been merged; work on garbage collection has started.

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.8.0-pre. Comparing with release 8.7.2, we could not detect any performance risks for that version. We even saw a slight improvement in block fetch related metrics, which led to slightly improved block diffusion times.

Furthermore, we've managed to boil down a complex residual performance issue measured on the cluster to a local reproduction. This enables our DevX team, with highly specialized knowledge of GHC's compiler internals, to investigate each step in code generation and optimzation, and independently observe the effects of code changes to the affected component.

Performance

Work on the remaining performance issue with GHC9.6 led us to produce two candidates based on Node 8.7.2, benchmarking the implacations local small changes have for GHC9.6's optimizer. Though those candidates did not uncover the issue's root cause, they were able to disprove a hypothesis as to its nature, and quantify the performance impact of said small changes.

Development

Node 8.8.0 comes with capabilities to inject DReps and DRep delegations into Conway genesis. We've started work on integrating those into our automations, and setting sensible values for benchmarking. The aforementioned delegations representing a new data structure in the Conway ledger, we aim to run existing workflows extended with varying sizes of that new structure, measuring their pressure on ledger queries and operations.

Workbench

The performance workbench relies heavily on shell scripting and manipulating JSON data for a great part of its features. This approach is very effective for quick experimentation, but lacks in verifiable properties as well as accessibility for new users of workbench.

After the successful Haskell port of cluster topology creation, and verification, we're currently applying the same model in porting the entirety of benchmarking profiles to Haskell. The obvious gains are widening workbench's audience both for users and developers, as well as implementing a principled approach to all workbench data structures and transformations.

At the same time, we're porting workbench's many options to create fine-tuned geneses, following the same approach.

Tracing

We've outfitted cardano-tracer with a handle registry feature that lets the service work on file handles internally, rather than opening and closing files for each operation. The feature is completed; at the moment we're adding appropriate test cases to the service's test suite for validation of its behaviour, and for safeguarding future development.

Nomad backend

Several improvements for our cluster backend have been merged to master, increasing its overall robustness. We can now safely handle some corner cases where Nomad processes unexpectedly exited, or deployments errored out. Furthermore, an ongoing run can now reliably survive a temporary loss of heartbeat connection between Nomad client and server, without the benchmarking metrics being affected.

Currently, we're working on a reliable automation of garbage collecting old nix store entries on the cluster machines, as they fill up disk space. The design has to consider both not interfering with ongoing benchmarks, and avoiding deployment overhead caused by cleaning the store too frequently.