Skip to main content

Performance & Tracing Update

· 5 min read
Michael Karg
Performance and Tracing Team Lead

High level summary

  • Benchmarking: Release benchmarks for 11.0.1; Feature benchmarks for: TxSubmissionLogicV2; Compiler version.
  • Development: Removal of legacy tracing completed - not yet merged.
  • Infrastructure: Genesis caching and post-processing completed - not yet merged.
  • Tracing: cardano-tracer HTTP API for metrics timeseries queries and Grafana datasource - not yet merged.
  • Leios: Leios/Mempool benchmarks using tx-centrifuge.
  • Node Diversity: Formal trace schema definition merged; Conformance framework to be presented at Porto workshop.

Low level overview

Benchmarking

We've performed, analysed and published relase benchmarks for Node version 11.0.1 - the release shows no performance regressions compared to 10.7.1. These benchmarks ran under Protocol Version 11, and were required to ensure there's no performance risk in using this version.

Furthermore, we've run feature benchmarks for a new incarnation of v2 of the tx submission logic. The new logic is an optimization and aims, among other things, to reduce redundancy in tx diffusion. While the feature is experimental, the benchmarks provided valuable measurements and data for the network team to move it forward.

Additionally, we've re-run benchmarks using the GHC9.12 compiler version on the new 11.0.1 baseline; since 10.6.2, there have been many changes in Ledger which impact generated code and compiler optimizations. While there's no fundamental performance blocker to use this more recent compiler on our code base, there are still a few unknowns. The data is currently still under review and discussion.

Development

With the upcoming 11.1 release, the legacy tracing system 'iohk-monitoring-framework' will finally be removed from the Node. The change extensive, as it involves large differences in project dependencies, in code, in configuration and in test suites. Old and new tracing system have been part of the Node build side-by-side for roughly two years now, with the new tracing system gaining wider adoption the last half year. Removing the need to stay backwards compatible with the legacy system within the same build unblocks several planned features for the new system, as well as finally moving it out into its own self-contained Hermod Tracing System project repository.

While the implemention is complete, the PR cardano-node PR#6580 is currently still in draft state, awaiting full verification and testing.

Infrastructure

The modularization of our automation's genesis cache is completed. In addition to quickly stitching together a custom genesis with a huge amount of injected staking data, it allows for all protocol-relevant fields of genesis to be freshly generated by cardano-cli - and not taken from the cache. This means, the post-processing has now been reduced to a minimum; that improves confidence in the benchmarking profiles insofar as it eliminates testing of workbench changes still being correctly patched onto potentially very long-lived cache entries on a variety of hosts.

Moreover, this change includes a proper profile overlay for Protocol Version 11, which includes changes to Plutus cost models and execution budgets that have already been submitted as a gov action on Mainnet. The (quite extensive) PR is currently in draft state and under testing: cardano-node PR#6544.

Tracing

The new version of cardano-tracer will come with an (opt-in via config) HTTP REST API to query metrics timeseries directly. As cardano-tracer can now store metrics of all connected Nodes, it's able to evaulate PromQL-like queries directly. This can be used as an alternative to having Prometheus scrape all of those processes. With the new release, we made the (previously experimental) API as much aligned to what users are accustomed to from Prometheus, so that it has reached reasonable stability.

Moreover, we built - from scratch - a Grafana datasource using that API. This datasource contains a dashboard to replace the deprecated 'RTView' component of cardano-tracer, and is intended to serve as a reference for the community to define their own dashboards and queries according to their monitoring needs.

This PR, too, is fairly extensive, and also contains several improvements and fixes of the underlying cardano-timeseries-io package: cardano-node PR#6562, currently in testing phase.

Leios

We've created, and performed, full cluster benchmarks for Leios - using our new high-pressure submission tool tx-centrifuge. The point of interest of these benchmarks was observing Mempool behaviour, under various levels of fragmentation, and various configurations as to its capacity. These benchmarks are meant close a gap to the Leios simulations providing evidence, by measuring concrete timings of a concrete Mempool implementation. The benchmarks have shown that a standard Mempool tuned to Praos will likely throttle maximum throughput for Leios. With this benchmark at hand, and Mempool identified as a potential bottleneck, the necessary adjustments or optimizations can always be confirmed and backed up by evidence.

Node Diversity

The comprehensive formal schema definition of the Node's existing trace messages has been merged (cardano-node PR#6527). This encodes the syntax and semantics of all the observable events that the Haskell Node implementation provides. Thus, it can serve as a reference to what diverse clients may implement - to gain comparability in protocol conformance, network performance, and the reuse of existing tooling relying on those observables.

That being said, the cardano-recon-framework is one such example. We've continuosly improved our Linear Temporal Logic based trace verifier for system behaviour; we've set several interesting properties that can be checked continuously from Node logs. One of our team will attend the Node Diversity workshop in Porto beginning of June, and contribute a presentation and a demo of this framework.

Network Team Update

· 4 min read
Marcin Szamotulski
Network Team Lead

Overview of sprint 114.

Summary

Leios

The TxSubmission v2 demo was merged, providing an end-to-end demonstration of TxSubmission V2. A benchmark study of incremental vs. non-incremental CBOR block decoding shows that decoding blocks incrementally yields modest improvements for standard Praos blocks, and larger gains for multi-megabyte blocks — directly relevant to Leios where block sizes may grow substantially. Cardano-base was bumped to include support for Leios voting. A simple TCP congestion-window model was added to the Leios Rust simulator (sim-rs), porting the model from the Haskell simulator; mini-protocol multiplexing remains a gap to be addressed. An alternative TxSubmission V2 design without a central decision thread is under review, superseding the earlier exploration.

Ouroboros-Network

bracketFetchClient and bracketKeepAlive were decoupled by splitting FetchClientRegistry into KeepAliveRegistry and FetchClientRegistry — a prerequisite for DMQ-Node integration. Contra-tracer was upgraded to 0.2.1, unblocking the removal of iohk-monitoring-framework from cardano-node 11.1. The peer-selection governance target was renamed from selectEnvTargets to selectGovTargets. Various tracing and API improvements were merged as part of ongoing DMQ-Node integration work. A new protocol version NodeToClientV_24 for the ValidateTx local-state-query is under review.

DMQ-Node

Version 0.5.0.0 was released. The announcyness metric for peer selection was merged: it scores peers by how often they are first to announce a valid signature, with scores kept for one hour matching the churn rate. SigId validation was improved: CBOR encoding was corrected to use Hash Blake2b_256 for the sigId field, and SigExpired failures are now restricted to the ZeroSetSnapshot case. An ouroboros-network integration update is in review. Work on connecting the DMQ kernel to cardano-node continues, blocked on pr#58.

IO-Sim and TypedProtocols

io-sim and typed-protocols repositories were moved to IntersectMBO organisation on GitHub.

Leios

PR / IssueStatus
TxSubmission V2 without a decision threadin review
Block decoder: incremental vs. non-incremental benchmarkin review
Bump cardano-base for Leios votingmerged
TxSubmission v2 demomerged
simple TCP model for sim-rsmerged

Ouroboros-Network

PR / IssueStatusNotes
Add NodeToClientV_24 for ValidateTx queryin review
cardano-diffusion: label ledger peer usagein reviewcontribution by dancewithheart
Various changesmerged
Renamed selectEnvTargets to selectGovTargetsmergedcontribution by dancewithheart
Upgrade to contra-tracer 0.2.1mergedcontribution by f-f
Decouple bracketFetchClient and bracketKeepAlivemerged
cardano ping implemented with ouroboros-networkblocked

DMQ-Node

PR / IssueStatus
node kernel cleanupin progress
Updated ouroboros-network versionin review
dmq-node-0.5.0.0 pre-releasemerged
SigId validationmerged
Drop support for x86_64-darwinmerged
announcyness metric for peer selectionmerged

IOSim

PR / IssueStatus
Repo moved to IntersectMBOmerged

Typed Protocols

PR / IssueStatus
Repo moved to IntersectMBOmerged

Mithril Team Update

· 3 min read
Jean-Philippe Raynaud
Mithril Tech Lead

High level overview

This week, the Mithril team completed the analysis of the impact of the recursive SNARK on the security of the Mithril protocol, implemented benchmarks for the non-recursive SNARK, and refactored the error handling of the SNARK recursive circuit. They continued work on circuit key caching for the SNARK circuit in the STM library, the removal of the helpers module for the SNARK recursive circuit, the prover input for the recursive SNARK aggregation primitives, and the implementation of the SNARK-friendly genesis certificate.

They also enhanced the synchronization of immutable files in the Cardano database, implemented robust support for unknown and in-progress signed entity types, removed the Cardano database v1 backend, and continued working on the prototype for Cardano node ledger state certification. Additionally, they enforced the DMQ message ID format and enhanced the support of the genesis verification key in the explorer.

Finally, the team completed the update of the protocol security page on the website and continued work on shipping the Mithril signer node binary in the Cardano node bundle.

Low level overview

Features

  • Completed the issue Refactor SNARK recursive circuit - Error handling #3127
  • Completed the issue Impact of recursive SNARK on Mithril protocol security #3133
  • Completed the issue Implement benchmarks for non recursive SNARK #3154
  • Completed the issue Enhance synchronization of immutable files of Cardano database #3243
  • Completed the issue Add non-recursive certificate circuit benchmarks #3274
  • Worked on the issue Circuit keys caching for SNARK circuit in STM #3043
  • Worked on the issue Refactor SNARK recursive circuit - Remove helpers module #3132
  • Worked on the issue Recursive SNARK aggregation primitives: Prepare prover input #3138
  • Worked on the issue Implement SNARK-friendly genesis certificate #3145
  • Worked on the issue Prototype Cardano node ledger state certification #3269
  • Worked on the issue Enhance support of genesis verification key in explorer #3270

Protocol maintenance

  • Completed the issue Enhance protocol security page on website #2703
  • Completed the issue Robust support for unknown and in progress signed entity types #3172
  • Completed the issue Remove Cardano database v1 backend #3268
  • Worked on the issue Ship Mithril signer node binary in Cardano node bundle in GitHub #3011
  • Worked on the issue Enforcement of DMQ message id format #3251

Mithril Team Update

· 3 min read
Jean-Philippe Raynaud
Mithril Tech Lead

High level overview

This week, the Mithril team completed the refactoring of the recursive circuit, the preparation of the prover input implementation in the STM library, the off-circuit verification tests for the recursive SNARK circuit prototype, and the replacement of the temporary certificate circuit with the STM circuit. They also continued work on circuit key caching for the SNARK circuit in the STM library, the recursive SNARK aggregation primitives prover input, the preparation of the SNARK-friendly genesis certificate implementation, and the non-recursive certificate circuit benchmarks.

The team continued work on shipping the Mithril signer node binary in the Cardano node bundle, robust support for unknown and in-progress signed entity types, enforcement of the DMQ message ID format, and enhancements to immutable file synchronization for the Cardano database.

Finally, the team completed enforcement of Mithril crate versions in downstream Mithril crates and enhanced the protocol security page on the website.

Low level overview

Features

  • Completed the issue Prepare the refactoring of the recursive circuit #3126
  • Completed the issue Prepare implementation of the prover input in STM #3137
  • Completed the issue Add off-circuit verification tests for recursive SNARK circuit prototype #3193
  • Completed the issue Replace temporary certificate circuit with STM circuit #3195
  • Worked on the issue Circuit keys caching for SNARK circuit in STM #3043
  • Worked on the issue Recursive SNARK aggregation primitives: Prepare prover input #3138
  • Worked on the issue Implement SNARK-friendly genesis certificate #3145
  • Worked on the issue Prepare SNARK-friendly genesis certificate implementation #3262
  • Worked on the issue Add non-recursive certificate circuit benchmarks #3274

Protocol maintenance

  • Completed the issue Enforce Mithril crates versions in downstream Mithril crates #3245
  • Worked on the issue Ship Mithril signer node binary in Cardano node bundle in GitHub #3011
  • Worked on the issue Enhance synchronization of immutable files of Cardano database #3243
  • Worked on the issue Enforcement of DMQ message id format #3251
  • Worked on the issue Enhance protocol security page on website #2703
  • Worked on the issue Robust support for unknown and in progress signed entity types #3172

Consensus Team Update

· 2 min read
Damian Nadales
Consensus Team Lead

High level summary

  • Leios prototype development (Treasury Funding Initiative 4: Ouroboros Leios Implementation):
    • Landed the first voting capability in the Leios prototype: nodes now diffuse votes over a dedicated mini-protocol and a voting thread casts votes on completed endorser-block closures. This is the foundation for committee-based endorsement and is exercised by new threadnet property tests (#1963).
    • Ongoing: reworking the prototype branch ("Leios prototype remake") to target the same ouroboros-consensus-3.0.1.0 release that ships in cardano-node 11.0.1, so downstream consumers building against that node release can pick up Leios without a separate consensus branch (#2041).
    • Ongoing: adding late-join support, so a node that joins the network after an endorser block was produced can still resolve the resulting certified blocks (#2040).
    • Ongoing: replacing the placeholder voting from #1963 with stake-based committee selection and real BLS signatures, so votes are individually validated before being relayed (#2039).
    • Ongoing: performance work on the in-memory Leios database to remove contention and laziness issues that were causing nodes to time out under load (#2032).
  • LedgerDB cleanup (Treasury Funding Initiative 10: LSM including UTXO-HD):
    • Retired the V1 LedgerDB implementation and the LMDB backing store. V2 has been the default for some time; removing V1 deletes a large amount of now-unreachable code, drops the LMDB dependency, and simplifies the LedgerDB API (for example, snapshots no longer block the caller, and the tryFlush no-op is gone) (#2030). This paves the way for adding more tables to the ledger state, enabling them to be stored on disk.