Skip to main content

48 posts tagged with "performance-tracing"

View All Tags

· 5 min read
Michael Karg

High level summary

  • Benchmarking: Preliminary 10.3 benchmarks; GHC8 / GHC9 compiler version comparison; Plutus budget scaling; runtime parameter tuning on GHC9.
  • Development: Started new Plutus script calibration tool; maintenance updates to benchmarking profiles.
  • Infrastructure: Adjusted tooling to latest Cardano API version; simplification of performance workbench nearing completion.
  • New Tracing: Battle-tested metrics monitoring on mainnet; generalized nix service config for cardano-tracer.

Low level overview

Benchmarking

We've run and analyzed several benchmarks these last two weeks:

Preliminary 10.3 integration

As performance improvement is a stated goal for the 10.3 release, we became involved early in the release cycle. Benchmarking the earliest version of the 10.3 integration branch, we could already determine that the effort put in has yielded promising results and confirm improvements in both resource usage and block production metrics. A regular release benchmark will be performed, and published, from the final release tag.

Compiler versions: GHC9.6.5 vs. GHC8.10.7

So far, code generation with GHC9.6 has resulted in a performance regression for block production under heavy load - we've established that in various past benchmarks. The optimization efforts on 10.3 also focused on removing that performance blocker. Benchmarking the integration branch with the newer compiler version has now confirmed it has not only vanished; moreover, code generated with GHC9.6 even exhibited slightly more favourable performance characteristics. So in all likelihood, Node 10.3 will be the last release to support GHC8.10, and we will recommend GHC9.6 as the default build platform for it.

Plutus budget scaling

We've repeated several Plutus budget scaling benchmarks on Node version 10.3 / GHC9.6. By scaling execution budgets to 1.5x and 2x their current mainnet values, we can determine the performance impact on the network of potential increases of said budgets. We independently measured bumping the steps (CPU) limit with a CPU-intensive script, and bumping the memory limit with a script performing lots of allocations. We could observe the performance impact to correspond linearly with the limit bump in each case. This gives certainty and predictability of the impact when suggesting changes to mainnet protocol parameters.

Our team presented those findings and the data to the Parameter Comittee for discussion.

Runtime system (RTS) settings

The recommended RTS settings for cardano-node encompass number of CPU cores to use, behaviour of the allocator, and behaviour of the garbage collector. The recommended settings so far are tuned to GHC8.10's RTS - one cannot assume the same settings are optimal for GHC9.6's RTS, too. So we've started a series of short, exploratory benchmarks comparing a matrix of promising changes to update our recommendation in the future.

Development

We've started to develop a new tool that calibrates our Plutus benchmarking scripts given a range of constraints on the expected workload. These entail exhausting a certain budget (block or transaction), or calibrating for a constant number of transactions per block while exhausting available steps, or memory, budget(s). The result of that directly serves as input to our benchmarking profile definition. This tool may also be of wider interest, as it allows for modifying various inputs, such as Plutus cost models, or script serializations generated by different compilers or compiler versions. That way, one can compare at a glance how effective a given script makes use of the available budgets, given a specific cost model.

Additonally, our benchmarking profiles are currently undergoing a maintenance cycle. This means, setups for which motivation has ceased to exist are removed, several are updated to use the Voltaire performance baseline, others need to be tested for their conformity with the Plomin hard-fork protocol updates.

Infrastructure

The extensive work of simplifying the performance workbench is almost finished and about to enter testing phase. We have been moving away from scripting to declarative (Haskell) definitions of all benchmark profiles and workloads in past PRs. The simplification work now reaps the benefits of that: We can optimize away many recursive / redundant invocations or nix evaluations, we can collate many nix store paths into just a couple ones, reduce the workbench's overall closure size and complexity. Apart from saving significant resources and time for CI runners, this will reduce maintence effort necessary on our end.

Furthermore, we've done maintenance on our tooling by adjusting to the latest changes in cardano-api. This included porting the ProtocolParameters type and type class instances over to us, as our use case requires we continue supporting it. However, it's considered deprecated in the API, and so this unblocks the team for eventually removing it.

New Tracing

Having addressed all feature and change requests relevant for the Node 10.3 release, we performed thorough mainnet testing of the new system's metrics in a monitoring context. We relied on the extremely diligent and helpful feedback from the SRE team. This enabled us to iron out a couple of remaining inconsistencies - a big shout-out and thank you to John Lotoski.

Additionally, again with SRE, a nix service configuration (SVC) has been created for cardano-tracer that was generalized and aligned in structure with the existing cardano-node SVC. It was evolved from the existing SVC in our performance workbench, which however was tied tightly to our team's use case. With the general approach we hope other teams, and the community, can reliably and easily set up and deploy cardano-tracer.

· 3 min read
Michael Karg

High level summary

  • Development: New benchmark epoch timeline using db-sync; raw benchmark data now with DB storage layer as default - enabling quick queries.
  • Infrastructure: Merged workbench 'playground' profiles - easing benchmark calibration.
  • New Tracing: Plenty new features based on community feedback - including a new direct Prometheus backend; untangle system dependencies.
  • Community: Participation in the first episode of the Cardano Dev Pulse podcast.

Low level overview

Development

For keeping a history of comparable benchmarks, it's essential to have an accurate timeline of mainnet protocol parameter updates by epoch. They represent the environment in which specific measurements took place, and are thus tied inherently to the observation process. Additionally, to reproduce specific benchmarking metrics from the past, our performance workbench has the capability to "roll back" those updates, and perform a benchmark given the protocol parameters of any given epoch. Instead of maintaining this epoch timeline by hand, we've now created an automated way to extract all key epochs applying parameter updates using db-sync. This approach will prove both more robust, and having lower maintenance overhead.

Furthermore, the new DB storage backend for raw benchmarking data in locli is now set to be the default for the performance workbench. Apart from cutting down analysis time for a benchmarking run and reducing the required on-disk size for archiving, this enables the new (still under development) quick queries into raw performance data.

Infrastructure

When creating the Plutus memory scaling benchmarks, we developed so-called 'playground' profiles for the workbench. These allow for easier dynamic change of individual profile parameters, building a resulting benchmark setup including Plutus script calibration, and observing the effect in a short local cluster run. Applying these changes to established profiles is strictly forbidden, as it would put comparability with past benchmarks at risk. So by introducing this separation, we keep that safety guarantee, while still lifting it somewhat for the development cycle only.

New Tracing

We've been extremely busy implementing new features and optimizations for the new tracing system, motivated by the feedback we received from the SPO community. This includes:

  • A brand new backend that allows for Prometheus exposition of metrics directly from the application - without running cardano-tracer and forwarding to it.
  • A configurable reconnection interval for the forwarder to cardano-tracer.
  • An always up-to-date symlink pointing to the most recent log file in a cardano-tracer log rotation.
  • Optimizations in metrics forwarding and trace message formatting, which should lower the base CPU usage, at least in low congestion scenarios.

All those will be part of the upcoming Node 10.3 release.

Currently, the cardano-tracer service still depends on the Node for a few data type definitions. We're working on a refactoring so we can untangle this dependency. This will allow for the service to be built independently of the Node - simplifying a setup where other processes and applications can forward observables to cardano-tracer and benefit from its features.

Community

We had the opportunity to talk about benchmarking and performance impact of UTxO-HD on the very first episode of the Cardano Dev Pulse Podcast (YouTube). Thank you Sam and Carlos for having us!

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Plutus memory budget scaling benchmarks; UTxO-HD benchmarks, leading to a fixed regression; Genesis benchmarks.
  • Development: Ouroboros Genesis and UTxO-HD adjustments in workbench; Maintenance tasks.
  • Infrastructure: Removing outdated deployments; Haskell profile definition merged; workbench simplification ongoing.
  • Tracing: C library development ongoing; Feature implementation according to community feedback; test removeal of old system.

Low level overview

Benchmarking

We've run and analyzed scaling benchmarks of Plutus execution budgets. In this series of benchmarks, we measured the performance impact of changes to the memory budgets (both transaction and block). We observed an expected, and reasonable, increase in certain metrics only. Furthermore, we've shown this increase to be linearly correlated to the budget raise. This means that when exploring the headroom of those budgets, the performance cost for the network is alawys predictable. The benchmarks serve as a base for discussing potential changes to those budgets in Mainnet protocol parameters.

When building a performance baseline for UTxO-HD, we were able to locate a regression in its new in-memory backing store, LedgerDB V2. We created a local reproduction of that for the Consensus team, who was able to successfully adress the regression. A corresponding benchmarking report will be published on Cardano Updates.

Furthermore, we've performed benchmarks with the Ouroboros Genesis feature enabled and compared them to the release benchmark baseline. We could not detect any performance risk to the network during "normal operations", i.e. when all nodes are caught up with the chain tip.

Development

During the course of building performance baselines for Ouroboros Genesis and UTxO-HD, we've developed various updates to the performance workbench to correctly handle the new Genesis consensus mode, as well as adjustments to the latest changes in the UTxO-HD node.

Additionally, we built several small quality-of-life improvements for the workbench, as well as investigated and fixed an inconsistent metric (Node Issue #6113).

Infrastructure

The recent maintenance work also extended to the infrastructure: We've removed the dependency on deprecated environment definitions in iohk-nix by porting the relevant configurations over to the workbench. This facilitates a thorough cleanup of iohk-nix by the SRE team.

As the Haskell package defining benchmarking profiles has been merged, and all code replaced by it successfully removed, we're now working very hard on simplifying the interaction between the workbench and nix. This mostly covers removing redundancies that have lost their motivation, applying to both how workbench calls itself recursively multiple times, as well as how (and how many) corresponding nix store paths are created when evaluating derivations. This will both enhance maintainability, and result in much lighter workbench builds locally - but especially on CI runners.

Tracing

Work on the self-contained C library implementing trace forwarding is ongoing. As forwarding is defined in terms of an OuroborosApplication, it's non-trivial to re-implement the feautures of the latter in C as well - such as handshake, version negotiation, and multiplexing. However, for the intended purpose of said library, it is unavoidable.

We've also started working on a new release of cardano-tracer, the trace / metrics consuming service in the new tracing system. This release is geared towards feature or change requests we've received from the community and found very valuable feedback. Having a seperate service to process all trace output enables us to react much quicker to this feedback, and decouple delivery from the Node's release cycle.

Last not least, we're doing an experimental run on creating a build of Node with the legacy tracing system removed. As the system is deeply woven into the code base, and some of the new system's components keep compatibility with the old one, untangling and removing these dependencies is a large endeavour. This build serves as a prototype to identify potential blockers, or decisions to be made, and eventually as a blueprint for removing the legacy system in some future Node release.

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks and performance baselines on 10.2 for UTxO-HD, new GHC, Genesis; 'Perdiodic tracer' benchmarks.
  • Development: Pervasive thread labeling in the Node; fix a race condition in monitoring dependency ekg-wai.
  • Infrastructure: Haskell profile definition work passed testing, ready for merge; continued 'Byron' support in our tooling.
  • Tracing: C library for trace forwarding reached prototype stage; last batch of documentation updates ready for publication.
  • Community: Support and valuable feedback on Discord for new tracing system rollout.

Low level overview

Benchmarking

We've performed a full set of release benchmarks and analyses for Node version 10.2. We could not detect any performance risks, and expect network performance to be equivalent or slightly better than 10.1.x releases, albeit using slightly more CPU resources under rare conditions.

Furthermore, we're building several performance baselines with 10.2 to compare future changes, features or node flavours to. For comparative benchmarks, it's vital every change be measured individually, as to be able to discern their individual performance impact. For Node 10.3, there are several of those we want to capture, such as crypto class simplifications in Ledger, UTxO-HD with a new in-memory backend, Ouroboros Genesis, and last not least a new GHC9.6 release addressing a remaining performance blocker when building Cardano.

Additionally, we've validated the 'Periodic tracer' feature on cluster benchmarks and now have evidence of its positive impact on performance. This feature decorrelates gathering metrics from the ledger from the start of a block producer's forging loop, without sacrificing predictability of performance. By removing this competition on certain synchronization primitives, the hot code path in the forging loop now executes faster. The feature will be integrated in a future version of the Node.

Development

We've tracked down a race condition in a community package that both tracing systems depend on for exposing metrics. In ekg-wai, a ThreadKilled exception could be re-thrown to the thread where it originated from. It is a low-risk condition, as it occurs only when then Node process terminates; however, when terminating due to an error condition, it caused the process to end prematurely, before the error could be logged. We've opened a PR (ekg-wai#12) against the package containing the fix and pre-released on CHaP.

Tracking down this condition could have been improved by providing pervasive, human-readable labels for all the threads that the Node process spawns. So in coordination with the Consensus team, we made sure this is the case for future builds of the Node - including locations in the code where dependency packages internally use forkIO to create green threads. This will enhance usability of debug output when looking into concurrency issues.

Infrastructure

The Haskell definition of benchmarking workloads - and the removal of its bash/jq counterpart - is complete, and has passed testing phase. This includes a final alignment between all profile content defined using either option. Once merged, this will open up the path for simplification of how nix interacts with the performance workbench - and hopefully reduce complexity for our CI runners.

As cardano-api is deprecating some protocol parameter related data types which do not have relevance for Cardano anymore, we've had a discussion with stakeholders about the implications for our tooling: This would effectively disable our ability to benchmark clusters of BFT nodes which do not use a staking / reward-based consensus algorithm - as it used to be in Cardano's Byron era. The decision was made to not drop that ability from our tooling, as there are potential applications for the benchmarks outside of Cardano. As a consequence, we've startied porting those types to live on in our toolchain, representing an additonal maintenance item within our team.

Tracing

The self-contained C library implementing trace forwarding is now in prototype state. It contains a pure C implementation of our forwarding protocol over socket, as well as pure C CBOR codecs for data payload to match the TraceObject schema used within the context Cardano. That ensures existing tooling can process traces emitted by non-Cardano applications, written in languages other than Haskell.

The latest updates to Developer Portal: cardano-tracer are ready to be published and awaiting a PR review on the Cardano Developer Portal.

Community

We've been quite busy on our new Discord channel #tracing-monitoring on the IOG's Technical Community server. There's been an initial spike of interest and we've been able to provide support and explain various design decisions of the new tracing system. Additionally, we've gotten valuable feedback about potential features that would greatly help adoption of the new system. These are typically highly localized in their implementation, and non-breaking wrt. to API and design, such that addressing this feedback promptly adds much value at low risk - Thank You for your input!

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for Node 10.1.4; performance evaluation of ledger metrics trace location.
  • Development: Database-backed quick queries for locli analysis tool.
  • Infrastructure: Voting workload definition merged to master, work on Haskell profile definition now continues.
  • Tracing: C library for trace forwarding and documentation ongoing; improved fallback configs.
  • Community: new Discord channel #tracing-monitoring supporting new tracing system rollout.

Low level overview

Benchmarking

We've run and analyzed a full set of release benchmarks for Node version 10.1.4. We could not observe any performance risks, and expect network performance to very closely match that of previous 10.1.x releases.

Furthermore, we've been investigating the location on the 'hot code path' where metrics from ledger are traced - such as UTxO set size or delegation map size. This currently happens at slot start, when the block forging loop is kicked off. We aim to decouple emitting those traces from the forging loop, and instead moving them to a separate thread. This thread could potentially wake up after a pre-defined time has passed, like e.g. 2/3 of a slot's duration. That would ensure getting those values out of ledger does not occur simultaneously to block production proper.

Moreover, as a new feature, it would enable tracing those metrics on nodes that do not run a forging loop themselves. And last not least, it would free up the way to providing additional metrics at the new location - like DRep count, or DRep delegations - without negatively affecting performance. Initial prototyping has yielded promising results so far.

Development

Parametrizable quick queries, a new feature of our analysis tool locli, have commenced development. They rely on the new database storage backend for raw benchmarking data to be efficient. These quick queries are based on a filter-reduce framework, with composable reducers, which provide a clean way to express exposing very specific points or correlations from the raw benchmarking data.

The quick query feature also incorporates ad-hoc plotting of the query results, and will incorporate exporting the result into exchange formats like CSV or JSON in the future.

Infrastructure

The voting workload definition has been cleanly integrated with the workbench. This also includes an abstract definition of concurrent workloads - which was previously unnecessary, as exactly one workload would be handled by exactly one and the same service. The integration, along with the added flexibility, has been merged to master.

We're now actively working again on the Haskell definition of benchmarking workloads, including a test suite. Most of this improvement had already been done; it still needs final realignment with the current state of all existing workloads. It will allow us to trade hard-to-maintain large jq definitions with concise testable code, and recursive shell script invocations with using a well-defined command line interface only once.

Tracing

Good progress has been made on the small, self-contained C library that implements trace forwarding. It will allow processes in any language that can call to C via a foreign function interface to use cardano-tracer as a target to forward traces and metrics. The initial prototype has already evolved into a library design, which intends to offer to the host application a simple way to encode to Cardano's schema of trace messages - and to use its forwarding protocol asynchronously, as to minimize interruption of the application's native control flow.

In preparation of the new tracing system's release, we've also revisited the fallback configuration values the system will use if it is accidentally misconfigured by the user. The forwarder component uses a bounded queue buffer for trace output to compensate for a possibly unreliable connection to cardano-tracer. The fallback bounds were chosen to conserve trace output at all cost - as it turns out, too high of a memory cost, if trace forwarding does not happen at all, due to faulty configuration. We've adjusted this and other fallback values to sensible defaults to guarantee a functional system even in case of configuration errors.

Community

Our team will host a new channel #tracing-monitoring on IOG's Technical Community discord server. The migration to the new tracing system might affect existing automations built by the community, or how existing configuration need adjusting to achieve the intended outcome. In the channel, we'll offer support for the community in all those regards, as well as answer more general questions regarding the Node's tracing systems.

Additionally, we're currently releasing our documentation improvements to the excellent Cardano Developer Portal, linked below.