Skip to main content

· 5 min read
Michael Karg

High level summary

  • Benchmarking: Finalized voting benchmarks on Node 10.0; workload implementation being generalized to be mergeable.
  • Development: Database-backend for our analysis tool locli merged; several metrics improvements with new tracing.
  • Tracing: C library for trace forwarding started; documentation improved; timing issue in forwarder fixed.

Low level overview

Benchmarking

The voting benchmarks have now finished. The exact implementation of how the voting workload is set up and submitted has been finalized and is currently being prepared for merging into master. This will add those benchmarks to the repertoire we can run on any future node version, and track potential performance changes of voting over time.

The setup allows us to add voting as an additional workload on top of the existing release benchmarking workloads - typically "value-only" and "Plutus loop". The value workload operates at 12TPS and always results in full blocks; we can draw a straight line comparison when a certain, constant percentage of each blocks is filled with vote transactions. The Plutus workload however is throttled by spending the block execution budget, and not so much by transaction size and TPS - contrary to voting submissions. This results in a large variance in block size that the network produces, and restricting analysis to the blocks that are actually comparable to each other greatly reduces sample size.

This means that in practice, we've found "voting on top of value-only" to represent the performance implications of voting most accurately. This workload will serve as a base for comparison over time, and will be run selectively on new versions, whenever the proposal / voting feature of the Conway ledger is touched.

As a conclusion to those benchmarks we've ascertained that:

  1. there is a performance cost to voting, vote tallying and proposal enactment
  2. on the system level, this cost is very reasonable and poses no operational risk
  3. on the service level, processing an individual voting transaction is even slightly cheaper performance-wise than a transaction consuming and creating multiple UTxO entries

Development

The analysis and reporting tool, locli ("LogObject CLI") now comes equipped with a database-backed persistence layer. This new backend has been validated by using it to re-analyse past benchmarks. Performance workbench integration has been completed, and by means of a new envvar this backend can be enabled for use in automations. It currently co-exists in locli with the default file system based persistence backend.

Apart from opening up raw benchmarking data to the full power of SQL queries, or quickly marshalling it into another format to feed into other applications, the new storage backend has considerable advantages regarding execution speed and resource usage. It both requires less RAM (around 30% less) during runtime, and less disk space - about 90% less! Standard analysis of a cluster run can now be performed in less than an hour, whereas it took around 2h before.

Currently, we're working on implementing parametrizable quick queries of that raw data - complete with adding plotting capabilities to locli. The queries are meant to easily extract and visualize very specific correlations that are not part of standard analysis, catering to the oftentimes investigative nature of performance analysis.

Furthermore, The new tracing system now provides direct insight into the chain tip's hash, exposing tipBlockHash, tipBlockParentHash and tipBlockIssuerVerificationKeyHash both as trace messages and metrics. Additionally, we've merged a fix for issue cardano-node#5751: the metric forging_enabled now correctly also observes the presence of the CLI option --non-producing-node.

Tracing

The new tracing system allows for trace and metrics forwarding from some process to cardano-tracer. For any Haskell application, the forwarder package can easily be included as a library. For applications written in other programming languages, we've decided a small, self-contained C library that handles forwarding is a viable way to provide this functionality to a much wider range of ecosystems. The C library will implement protocol handshake and possibly muxing, the forwarder protocol messages being used, and CBOR-based encodings of trace messages and metrics - which only exists in Haskell currently. We've just started the prototype.

We've been working hard on updating and improving the documentation for the new tracing system on https://developers.cardano.org (not merged yet). The aim was to provide a quick start guide to "just get it set up and running", without presupposing any knowledge of tracing, or Haskell. Moreover, for users coming from the legacy tracing system, we wanted to highlight the key differences between systems - and possibly different assumptions when operating them.

Last not least, we caught a very interesting timing issue in the forwarder: each service connected to cardano-tracer bears both an internal and external name for the connection (both unique), where the external name is chosen by the service itself. Upon forwarder initialization, so called data points are set up within the service, into which data can then be traced (such as that external name), and which are actively polled / queried by cardano-tracer. As these are all concurrent events, the external name wasn't yet available in the data point, if initialization of forwarding happened "too fast". Once located, fixing this was trivial by enforcing a relative ordering of concurrent events just during initialization.

Happy New Year!

It's been an amazing year for the Performance & Tracing team. We're proud to have contributed to Cardano's transition into the age of Voltaire, and reliably safeguarded performance of the Cardano network - and to have finalized our new tracing system. A huge thanks to all those who've been helpful, supportive - and who've presented us with new ideas and challanges.

Have a Happy New Year 2025!

· 2 min read
Marcin Szamotulski

Overview of sprint 77

Initiator only mode for local roots

We implemented the initiator-only mode for local roots peers described in ouroboros-network#5020. This feature will be available in cardano-node-10.3 (cardano-node#6055).

One will be able to specify diffusionMode (either InitiatorOnly or IntiatorAndRespoinder, the latter is the default) for all local roots in a given local roots group, e.g.

{ "localRoots":
[ { "accessPoints":
[ { "address": "10.0.0.1"
, "port": 3001
}
]
, "advertise": false
, "diffusionMode": "InitiatorOnly"
, "warmValency": 1
, "hotValency": 1
}
, { "accessPoints":
[ { "address": "10.0.0.2"
, "port": 3001
}
]
, "advertise": true
, "diffusionMode": "InititiatorAndResponder"
, "warmValency": 1
, "hotValency": 1
}
]
, "publicRoots": []
, "useLedgerAfterSlot": -1
}

As part of ouroboros-network#5020 we had to change how connections are identified in the simulated testnet environment. We exposed ConnStateIdSupply through P2P interfaces, which allows us to use a global ConnStateIdSupply for all nodes in the simulation. This way, ConnStateId is unique for each connection in the simulation. See ouroboros-network#5026.

Support systems with multiple IP addresses

We merged ouroboros-network#5017 which allows to run cardano-node on systems with multiple network interfaces or a single interface with multiple IP addresses.

Reusable diffusion

We were working on early integration of reusable diffusion work stream with ouroboros-consensus & cardano-node. Reusable diffusion will allow us to support both cardano-node and mithril-node in the future. We are pleased to say that we are running a cardano-node that uses refactored diffusion. See:

Tx-Submission Logic

We had a discussion with the consensus team (Karl Knutsson CF, Nick Frisby Tweag) on network requirements for tx-mempool. See:

SRV record support

We continued working on the SRV support, see:

Technical Debt

We renamed some of the test modules to be more consistent across various network components, see ouroboros-network#5028.

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Nixpkgs has been updated to 24.11 across all environments

  • Nix has been updated to 2.25.3 across all environments

  • TCP tuning was applied on one relay per pool group across all environments to minimize round-tripping across long distances.

  • This is the last SRE update for 2024. Hello 2025!

Repository Work

Cardano-parts

  • Nixpkgs has been updated to 24.11 and nix to 2.25.3. NixosModules and template just recipes with breaking changes from those updates were fixed. A nix jobs GHA CI test was added to verify environment spin up procedure. Template scripts were updated for compatibility with latest cardano-node protocol version and recent cardano-cli breaking changes. More details are available in the release notes: cardano-parts-release-v2024-12-19

Cardano-playground

  • Nixpkgs has been updated to 24.11 and nix to 2.25.3 and all machines deployed along with breaking changes fixes. A fund-transfer recipe was added along with other miscellaneous improvements. More detail is available in the PR description: cardano-playground-pull-38

Cardano-mainnet

  • Nixpkgs has been updated to 24.11 and nix to 2.25.3 and all machines deployed along with breaking changes fixes. Bootstrap scaling servers were disabled and block producer auto scheduled restarts were stopped. TCP transmission optimization for long distances was applied to one relay per pool group. More detail is available in the PR description: cardano-mainnet-pull-28

Iohk-nix

· One min read
Noon van der Silk

High-level summary

Entering into December, with some colleagues on holidays, we are finalising our outstanding work, and continuing to support the Hydra Doom tournament. We remain focused on finishing incremental commits, and getting multiple-version support into the explorer.

What did the team achieve?

  • Final reviews on incremental commits #199
  • Make it easier to publish docker images for branches #1756
  • Progress on custom ledger experiment #1742
  • Progress on Hydra explorer supporting multiple versions #1282

What's next?

  • Merge incremental commits #199
  • Hydra explorer supporting multiple versions #1282
  • Finish custom ledger experiment #1742
  • Plan the 0.20.0 release
  • Continue support Hydra Doom

· One min read
Damian Nadales

High level summary

  • Well-Typed held a new lsm-trees milestone presentation, where they showed the progress in two important features:
    • Snapshots (for persisting ledger snapshots)
    • Table union (for storing more parts of the ledger state on disk)
  • Finished the UTXO-HD code review work. Since this feature could have a performance impact, we need to run a new set of system-level benchmarks before we can merge it. The next steps are detailed in this comment.
  • Submitted a request to the Technical Steering Committee on how the node should handle low apparent participation.
  • Added support for computing and checking CRCs of ledger state snapshots, which increases robustness when loading this data from disk (#1319).