Skip to main content

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for Node 9.1.1; additional UTxO-HD in-memory benchmarks.
  • Development: Created a local reproduction for observed UTxO-HD RAM increase.
  • Workbench: Created a new "age of Voltaire" performance baseline. Adjusted Nomad backend has entered testing phase.
  • Infrastructure: Dropping the requirement on Vault, optimizing cluster setup.
  • Tracing: New metrics naming schema was merged. Routing to internal monitoring servers is ongoing. Dropping dependency on HsOpenSSL.

Low level overview

Benchmarking

Runs and analyses for a full set of release benchmarks have been performed for Node version 9.1.1. In comparison with Mainnet releases 9.0 and 9.1.0, we could determine this version does not exhibit any performance regression.

Having been provided with the patch by Consensus targeting the increased RAM usage of the UTxO-HD in-memory backend (read below), we've performed additional benchmarks to validate the desired result on the cluster. Our measurements demonstrate the increased memory need has now vanished. We're confident that by now we've located - and addressed - all performance risks for UTxO-HD in-memory that we can capture given the instruments at our disposal. To gain further confidence in the stability of resource usage pattern and network metrics observed on the benchmarking cluster, we've advised long-running UTxO-HD nodes under close monitoring.

Development

We succeeded in creating a local reproduction of the increase in RAM usage that was observed for the UTxO-HD in-memory backend on the cluster. That reproduction enabled the Consensus team to inspect in real-time and profile running Node processes - which led to a swift identification of the underlying cause and a patch addressing it.

Workbench

After the smooth Chang hard fork which transitioned Cardano into the Conway era, we've created - and merged - a new performance baseline. It's intended for release benchmarks and caters to the new features of the Conway ledger. Apart from incorporating the latest protocol version and Plutus cost models, it includes DRep presence in ledger when performing measurements.

The PR preparing our workbench for a nixpkgs upgrade and removing the container-based Nomad / podman backend is complete and has entered testing phase.

Infrastructure

Currently, our Nomad cluster uses Vault to manage access and credentials for the benchmarking cluster. As the cluster exclusively relies on static routes, and fixed deployment endpoints, encoding access as a set of rules into the cloud infrastructure is a viable option. That way, we will no longer depend on the Vault service, removing the requirement of hosting, and maintaining, an instance of it.

Tracing

Aligning the metrics naming schema and semantics between new and legacy tracing systems has been completed and merged. This will enable a seamless interchange in the community, as all existing configurations of monitoring services remain their validity.

As for hosting multiple EKG metrics monitors in one single service application, we ascertained that the ekg package was not built for that use case. However, we've come up with a much nicer design for cardano-tracer using dynamic routing based on the names of nodes connected to it. It has successfully passed prototype stage in that it's able to serve multiple EKG monitors without the need for any server restart; the full implementation is being worked on.

Last not least, both existing tracing systems rely on the snap server framework, and thus by transitive dependency, on HsOpenSSL to speak the HTTPS protocol. However, we've determined the latter package to have a risk of breaking the build, both currently and in the future (cf. HsOpenSSL#95 and HsOpenSSL#88). As a consequence, a switch to the wai / warp based framework was decided, which implements HTTPS capability differently, thus preempting the risk. This has already been carried out for the legacy system, and currently is for cardano-tracer - a big shoutout to Erik de Castro Lopo for his support on that issue.

· 2 min read
Kevin Hammond

Blocks from the future

We identified two issues relating to "blocks from the future".

  1. Blocks from the near future
  2. Blocks from the far future

While blocks from the near future have been known to occur on mainnet as a result of clock skew/misconfiguration, there are no known instances on mainnet of blocks from the far future. In both cases, restarting an affected node would resolve the issue.

What is meant by a Block from the Future?

A node considers a block to be from the future if its slot is ahead of the current slot. Ouroboros Praos mandates that all chains containing blocks from the future (at that time) are ignored during chain selection. As Praos assumes that all nodes have access to perfectly synchronized clocks, this will never cause nodes to disregard blocks that have been minted by other honest nodes. In an actual real-world deployment, this assumption is unrealistic due to the imperfections of protocols like NTP as well as leap seconds.

The issues that were identified meant that blocks from the future could potentially be used by malicious actors to create denial-of-service attacks.

Both issues were fixed by Cardano node 8.8 or later, and were eradicated at the Chang hard fork.

Further Details

Report on Blocks from the Near Future

Report on Blocks from the Far Future

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Mainnet was hard forked to Conway era!

  • Legacy mainnet relays from cardano-ops cluster were stopped and retired.

  • Legacy cardano-explorer hosted at explorer.cardano.org was retired with landing page and beta explorer services now provided by Cardano Foundation.

  • Cardano-smash production load was cutover from the legacy cardano-world cluster to the replacement cardano-mainnet cluster. Remaining cardano-world resources will be retired in the near future.

  • Cardano-faucet was updated for cardano-node 9.1.x level compatibility.

Repository Work

Cardano Faucet

  • Brings faucet up to cardano-api and cardano-cli level of cardano-node 9.1: bumps relevant flake pins, updates CHaP indexes, applies fixes for upstream breaking changes, removes cardano-addresses srp, adjusts ghc options, fixes ming32 CI builds, applies most hlint and fourmolu style and config suggestions respectively: cardano-faucet-pull-12

Cardano Parts

  • Sets cardano-node to 9.1.1, cardano-db-sync to 13.5.0.2, cardano-faucet to 9.1. Adds alerts, dashboard fixes, nixos iowait optimization, smash and blockperf nixosModule improvements. More detail is available in the PR description: cardano-parts-pull-47

Cardano-mainnet

  • Deploys cardano node to 9.1.1, cardano-db-sync to 13.5.0.2. Improves smash deployments and backup role for production load handling. Improvements made in cardano-parts PR#47 are included in this PR. More detail is available in the PR description: cardano-mainnet-pull-21

Cardano-playground

  • Deploys cardano node to 9.1.1, cardano-db-sync to 13.5.0.2, cardano-faucet to 9.1. Tests RTS parameter optimization and tracing system changes on preview network machines, tests utxo-hd-9.1.1 on mainnet edge nodes. Improvements made in cardano-parts PR#47 are included in this PR. More detail is available in the PR description: cardano-playground-pull-31

Cardano-world

  • Destroy retired legacy explorer metal machines and disable alerting: commit-compare

· One min read
Damian Nadales

High level summary

  • Added a snapshot-converter tool, which will be merged soon. This tool converts non-UTXO-HD ledger snapshots, into UTXO-HD ones, so that the user does not have to replay from Genesis when using an UTXO-HD capable node. This patch also solves an issue with deserialization of TxOuts in Conway in the UTxO-HD implementation.
  • Solved a memory leak in the UTxO-HD implementation. This patch will be benchmarked this week.
  • Wrote a test for adding large txs to the mempool.
  • Expanded the Mempool capacity beyond just byte size.
  • @amesgen discovered and advised on a Conway ledger snapshot deserialization bug.

· 2 min read
Jean-Philippe Raynaud

High level overview

The Mithril team kept working on decentralizing the signature orchestration of the Mithril network. In this preliminary phase, they began implementing a buffer store for individual signatures that may arrive before being processed by an aggregator. Additionally, they moved the broadcast of signer registrations to the aggregator’s epoch settings route. The team also monitored the Chang upgrade to ensure the Mithril networks operated correctly and cleaned up unnecessary code. Additionally, they implemented a seamless transition from unstable to stable features in the WASM client.

Finally, the team activated a feature allowing the selection of the arithmetic backend used by Mithril cryptography in the client and refactored the organization of signer dependencies.

Low level overview

  • Completed the issue Follow up the Chang hard fork #1910
  • Completed the issue Aggregator exposes Cardano transactions signing configuration #1898
  • Completed the issue Optimize memory usage of signer for Cardano transactions #1903
  • Completed the issue Add message signed in signature HTTP messages #1899
  • Completed the issue Cargo Deny complains about LGPL-3.0 licenses #1786
  • Completed the issue Reorganize signer dependencies #1906
  • Worked on the issue Aggregator buffers signatures for unknown open message #1900
  • Worked on the issue Signer retrieves registrations with epoch settings route #1897
  • Worked on the issue Seamless transition of features from unstable to stable in client WASM #1911
  • Worked on the issue Test Cardano transaction chain rollbacks #1840