Skip to main content

53 posts tagged with "performance-tracing"

View All Tags

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Feature benchmarks for a new tx submission logic; generalized on-disk benchmark profiles.
  • Development: New tracing system for cardano-submit-api; Node conformance testing groundwork, being applied to metrics.
  • Infrastructure: Dijkstra era being incorporated into benchmark tooling.
  • Tracing: Creating a library package and API from cardano-tracer, facilitating dedicated trace consumer applications.
  • Team: Ruslan joins Performance & Tracing, will be aimed at Leios.

Low level overview

Benchmarking

We've performed and analysed feature benchmarks of TxSubmission protocol, version 2. It's designed to reduce redundant exchange of transactions between network nodes. To that end we've created a dedicated benchmark which is able to locally reproduce performance observations on the cloud cluster - and hence be executed with GHC profiling for tracking time and space usage.

The on-disk (LMDB) benchmarks for the cluster have been generalized into full-fledged benchmarking profiles that can be scaled for available RAM (and thus the pressure on the on-disk backing store to perform disk I/O) - and be applied to other backing store implementations as well. This will allow for direct performance comparisons with the lsm-trees solution, once it is integrated into the Node.

Development

There's a microservice, cardano-submit-api, which is run independently from the Node (if desired). It allows for Cardano transactions to be submitted via HTTP POST, instead of Cardano native protocols. It's still using the legacy tracing system, which is why we're currently porting its logging and metrics to the new one.

We're also laying groundwork for (multi-)node conformance testing. This entails creating a specification document for semantics of existing traces. These traces can then be emitted accordingly across diverse node implementations. Given the unified semantics, these points of evidence can then be evaluated against each other, against our Haskell reference implementation, or against a model of specified / expected behaviour, resulting in a quantifiable way to assess conformance across individual implementations. Currently, we're implementing a playground version of this in the tracing system's own test suite, where we assess whether the metrics a node exposes conforms to the trace evidence in its logs, and to the metrics it forwards to cardano-tracer.

Infrastructure

As a maintenance task, we're integrating the new Dijkstra ledger era into our performance workbench and all benchmarking tools. This will allow us to specify existing profiles in the new era (allowing us to comparatively benchmark its implementation against previous eras) as well as create new benchmarks making use of any Dijkstra-specific feature.

Tracing

The trace consuming / processing service cardano-tracer had been built as a monolithic application. We're currently redesigning it as a more modular one, splitting it up into a library and an application proper (which hosts all its current high-level functionality). The underlying library will be equipped with an API that meets community standards in the future. For now, we're focusing on making all library components safely restartable and reconfigurable, as well as providing abstract, clean intra-process communications (cardano-tracer is a highly multi-threaded app). These capabilities are also verified in the test suite.

This will facilitate rapid development of custom, specialized applications for trace consumption and processing: The library package will provide all mid-level abstractions, as well as the Cardano native multiplexed forwarding protocol. It will allow any application to focus exclusively on implementing the high-level features it aims to.

Team

Ruslan joined Performance & Tracing as a Software Engineer. He has a ton of functional programming experience under his belt, in Idris, in Haskell and in Scala. Additionally, he used to work as a perfomance engineer for a large, distributed commercial system. This all makes him an ideal candidate for Cardano performance engineering, and perspectively for all new Leios benchmarks and performance tests. Welcome, Ruslan!

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 10.5; LMDB (UTxOs on-disk) benchmarks.
  • Development: New memory-constrained benchmark families, targeting UTxOs on-disk.
  • Infrastructure: Migration of bench cluster completed.
  • Tracing: Deprecation of legacy system; TCP forwarding merged; improved self-documentation.
  • Meetup: Performance & Tracing meetup held in Cardiff, Wales.

Low level overview

Benchmarking

We've performed and analysed release benchmarks for Node 10.5. The pre-release turned on peer sharing by default; our benchmarks indicated a negative performance impact when enabling this on block producers. The current release 10.5.1 does not enable peer sharing for block producers; the published final results can be found in the Performance report for 10.5.

Additionally, we've achieved meaningful benchmarks for UTxOs-on-disk, which use the LMDB backend. Our new profiles support seamless scaling of RAM pressure on the backend, forcing disk I/O to a varying degree. We're currently analysing the observations made, and gathering more data using different scaling factors if needed; the goal is a reliable assessmemt of LMDB's viability for block producing nodes.

Development

Developing RAM-constrained benchmarks that would put tunable pressure on an UTxOs-on-disk backend posed a unique challenge.

First and foremost, limiting memory for past in-memory benchmarks has never been a requirement at all. A consistent approach to do so given the existing deployment had to be built, along with pertinent diagnostic tooling. Second, the LMDB backend is not managed by Haskell's GHC runtime, but comes with its own memory management - which required us to develop a double-pronged approach to selectively apply RAM limits. Lastly, other parts of the Node's code didn't support executing in tightly limited, constant space and would lead to the OS terminating the Node for running out of memory.

The interaction of various cgroup limits on Linux kernels, memory settings in our Nomad deployment and GHC RTS options let us create a stable runtime behavior over the course of the benchmark - a hard requirement, as system metrics taken at the beginning of a run must be comparable to those towards the end. A blocker for initializing the Node for benchmarks was resolved in cardano-node PR#6295: Using mmap allowed us to use the OS's virtual memory subsystem for on-demand loading instead of it being managed by the Haskell runtime - which significantly brought down the heap size required for the task.

Infrastructure

The migration to the new benchmarking cluster is finalized. For representative performance measurements of UTxOs-on-disk, we require direct SSD storage on each cloud machine instance. Along with deploying the new hardware came a full upgrade of OS and nixpgks software. Validating the migration was extra effort: A seemingly innocent cloud provider service (which was newly added to their NixOS image) did nothing more than a heartbeat request every 5 min to some central infrastructure server. Yet, it caused the standard deviation of some of our network related metrics to double - thus reducing confidence in those metrics.

After validation, we performed a complete re-run of existing performance baselines on the new hardware.

Tracing

Work on the new tracing system has yielded various improvements. Trace forwarding over TCP is now fully functional and merged to master. This will make setting up forwarding to remote hosts much easier than by using UNIX domain sockets / Windows named pipes. However, it's recommended for secure and isolated environments only (cardano-node PR#6241).

The auto-documentation feature of the new system has improved user experience; the generated documents are now structed in a more accessible way, and contain necessary metadata as to which Node version / commit is being documented by them (cardano-node PR#6283).

Another refactoring targeted the new system's core library, trace-dispatcher. It now comes with a minimal dependency footprint and fully equipped with all necessary type definitions to be built using Haskell's community packages (Hackage) exclusively. This greatly enhances the ease to use it for other applications than the Node as well - Cardano or non-Cardano (cardano-node PR#6268). Increasing the dependency footprint is only required for additional features, like trace forwarding.

With the upcoming Node 10.6 release, we plan to officially deprecate the legacy tracing system. This means it will enter a grace period of ~3 months, where both systems coexist in the same Node build; then, it will be decommissioned and actively removed from the code base.

Meetup

We organized 2025's in-person team meetup in Cardiff, Wales. We had highly insightful and productive days - I would like to thank all team members who contributed, and extend my special thanks to guests from outside of the team: Your presence and contributions were greatly appreciated, Can Huzmeli and Neil Davies.

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Feature benchmarks for ledger metrics tracer and InboundGovernor optimizations.
  • Development: Ledger metrics merged; 2 hotfixes for old tracing.
  • Infrastructure: Migration plan for on-disk benchmarks (LMDB, LSM-tree); initial Leios impact analysis.
  • New Tracing: Tracer service now independent of Node; new feature enabling forwarding over TCP.

Low level overview

Benchmarking

We've completed two distinct feature benchmarks: The new periodic ledger metrics tracer and InboundGovernor optimizations on the network layer. Both features have shown a positive performance impact; the former improves CPU usage and block production metrics, the latter slightly improves diffusion metrics.

Development

Having finalized and benchmarked the periodic ledger metrics tracer feature, it was merged to master and will be part of the upcoming 10.5 release. The feature decorrelates obtaining several metrics from the beginning of the forging loop. This avoids competition for synchronization primitives during the "hot phase" of block production. Furthermore, by decoupling those metrics from a forging tracer, we enable exposing those metrics from a relay as well. cardano-node PR#6180

Additionally, we've been vital in creating two hotfixes for the old tracing system:

  1. The old tracing system metric utxoSize was missing due to using the pre-UTxO-HD variant of querying the set size. The fix ports the correct solution from the new tracing system to the old one: cardano-node PR#6217
  2. On the upcoming Node 10.5 integration branch only, the old tracing system could leak file descriptors. Again, the fix was ported from the new tracing system to the old one - kudos to Karl Knutsson: iohk-monitoring PR#654

Infrastructure

We've discussed and set up a migration plan for our benchmarking cluster hardware. For fair and representative performance measurements of on-disk backing stores of UTxO-HD, we require direct SSD storage on the machine instance in the cloud; running disk I/O through additional layers to and from some shared SSD device, even in the same data center, would introduce significant confounding factors. The plan includes invalidating as little of our existing performance baselines as possible when migrating to the new hardware. We're looking forward to benchmark the current on-disk backend (LMDB) for block producers - as well as the futuere LSM-tree based one.

We've also discussed an initial Leios impact analysis. To fairly and reliably benchmark a future Leios implementation, our infrastructure and tooling will need to be extended significantly. Several metrics won't have the same weight they currently carry for Praos, due to Leios' later finality; other metrics will need to be introduced for different new Leios block types, adding appropriate observability to the implementation. Finally, creating and submitting a saturation workload for a system which is built for extremely high throughput will be a challenge in itself.

New Tracing

We've been working on a medium-sized refactoring that eliminates the cardano-node dependency from cardano-tracer. This means, the tracer service can now be built independently of the Node; all shared data types have been moved to some more basic packages of the new tracing system. This also enables us to issue releases of the tracer service independently of the Node's release cycle. cardano-node PR#6125

Last not least, we've kicked off development for a new feature that's been motivated by community feedback: Forwarding observables (trace messages, metrics) over TCP. Forwarding to different hosts currently assumes a UNIX domain socket that connects the Node and the tracer service through an SSH tunnel. This is a portable, versatile, and probably one of the most secure ways to transmit sensitive data. However, in an environment where an operator controls all network port mapping and firewalls, one can argue that forwarding over TCP/IP is equally viable, as it can be properly isolated - and it is much more convenient to set up and configure. The feature aims, when it's completed, to offer both forwarding routes, and let the end user decide.

· 3 min read
Michael Karg

High level summary

  • Benchmarking: 10.4.1 release benchmarks; UTxO-HD, GC settings and socket I/O feature benchmarks.
  • Development: Abstracting over quick queries and trace queries; enabling query processing on remote hosts.
  • Infrastructure: Workbench simplification merged; GHC8.10 tech debt removed.
  • New Tracing: Provided hotfix for several metrics.

Low level overview

Benchmarking

We've completed release benchmarks for Node 10.4.1. It is the first mainline release of an UTxO-HD node featuring LedgerDB. Leading up to the release, we previously performed and analysed UTxO-HD benchmarks. We were able to document a regression in RAM usage, and assisted in pinpointing its origin, leading to it being fixed swiftly for the 10.4 release.

Additionally, we ran feature benchmarks for a potential socket I/O optimization in the network layer, and GC setting changes catering to the now-default GHC9.6 compiler. Both benchmarks have shown moderate improvements in various performance metrics. This might enable the network team to pick up the optimization for 10.5. Also, we might be able to update the recommended GC settings for block producers, and add them to our own nix service configs for deployment.

The 10.4.1 performance report has been published on Cardano Updates.

Development

We've further evolved the (still experimental) quick query feature of our analysis tool locli. Parametrizable quick queries allow for arbitrary queries into raw benchmarking data, uncovering correlations not part of standard analysis. They are implemented using composable definitions inside a filter-reduce framework. With locli's DB storage backend, we can leverage the DB engine to do much of the work. Now, we're integrating a precursor to quick queries - so called trace queries - into the framework. Those can process raw trace data from archived log files. Currently, we're adding an abstraction layer such that it is opaque to the framework whether the data was retrieved (and possibly pre-processed) from a DB or from raw traces.

Furthermore, we added a custom (CBOR-based) serialization for intermediate results so a query can be evaluated on a remote machine - like the system archiving all benchmarking runs - but triggered, and its results visualized, on your localhost.

Infrastructure

The workbench nix code optimization has finally been merged. Redundant derivations and recursions have been replaced; many nix store entries have been consolidated. Among other things, the new code also aims to maximize nix cache hits. Furthermore, as GHC8.10 has now been officially retired from all build pipelines, we were able to clean up all tech debt in our automations that we had to keep around due to supporting the old compiler version.

Exactly as we had hoped, this has brought down CI time for the Node by orders of magnitude; first, from over an hour to around 15 min, then to under 10 min. Also, all workbench shell invocations are significantly faster, and clutter in the nix store is greatly reduced.

New Tracing

We've been hurrying to provide hotfixes for connectionManager_* and slotsMissed metrics that were faulty on Node 10.3. They have been successfully integrated into the Node 10.4 release.

· 4 min read
Michael Karg

High level summary

  • Benchmarking: 10.3.1 release benchmarks.
  • Development: Plutus script calibration tool and profile maintenance updates about to be merged.
  • Infrastructure: Workbench simplification about to be merged.
  • New Tracing: System dependencies untangled; preparing 'Periodic tracer' feature for production.
  • Node Diversity: Participation in Conformance Testing workshop in Paris.

Low level overview

Benchmarking

We're currently running release benchmarks for the upcoming Node 10.3.1 version - a candidate for Mainnet release. Having taken previous measurements on the release integration branch, we expect the results to be closely aligned with these.

Node 10.3.1 will support two compiler versions, namely GHC8.10.7 and GHC9.6.5. As a consequence, we benchmark both Node builds and compare against the previous performance baseline 10.2. So far, the release benchmarks confirm performance improvements in both resource usage and block production metrics seen on the integration branch - for both compiler versions. A final report will be published on Cardano Updates.

Development

The first version of our new tool calibrate-script is about to be merged. It is part of the tx-generator project, and calibrates Plutus benchmarking scripts according to a range of constraints on the expected workload. The tool documents the result and all intermediate steps in a developer-friendly way. A CSV report is generated which shows all properties of some calibration at a glance: How much execution budget was given, and how much of each execution budget was used, was memory or CPU steps the limiting factor for the script, how large will the resulting transaction be and what will it cost and more. Apart from greatly speeding up development of Plutus benchmarks for our team, this tool can also be used to assess changes to Plutus cost models, or efficiency of different Plutus compiler outputs - without running a full benchmark.

Furthermore, the benchmarking profiles defined in cardano-profile have undergone a large maintenance cycle. Besides a cleanup, several profiles were fixed wrt. transaction fees or duration, others now run on a more appropriate performance baseline. There era-dependency of a profile requiring a minimum protocol version has been solved such that it's now impossible to construct incompatible profiles by definition - e.g. a PlutusV3 benchmark in any era prior to Conway. The correspondig PR is about to be merged shortly.

Infrastructure

A large PR simplifying the build of our performance workbench has been finalized and passed testing. The nix code has been greatly optimized to avoid redundant derivations and creating an abundance of nix store paths. This not only makes the workbench better maintainable, it greatly reduces time and size requirements for CI jobs. In testing, we could observe a speedup of 40% - 50% for CI. Additionally, this PR prepares for the future removal of GHC8.10 as a release compiler - which will reduce CI cost even more. The PR is currently under review and to be merged soon.

New Tracing

The work on untangling dependencies in the new tracing system has entered testing phase. The cardano-tracer service no longer depends on the Node - with common data types and typeclass instances having been refactored to a more basic package of the tracing system. Once merged, this will allow for the service to be built, released and operated independently of cardano-node, widening its range of use cases.

On Node 10.1, we've built a prototype of the 'Periodic tracer' feature. It decorrelates tracing ledger metrics from the start of a block producer's forging loop, thus removing competition on certain synchronization primitives. We've already shown in past benchmarks it had a positive impact on block production performance. This prototype is now being developed for production release, complete with configuration options, and we aim to land it in Node 10.4.

Node Diversity

We've contributed to the recent Conformance Testing workshop in Paris. The topic was how to approach detection and documentation of system behaviour across diverse Cardano Node implementations: Where is the behaviour conforming to some blueprint, where does it deviate - intentionally or accidentally. Our tracing system is the prime provider of observability - and all evidence of program execution could in theory be checked against a machine-readable model of the blueprint. This of course assumes observables are implemented uniformly across diverse Node projects, i.e. without changing semantics. Thankfully, our tracing system lead engineer Jürgen Nicklisch was able to join that workshop and add to the discussions around that approach.