Skip to main content

· 3 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • The preprod network was hard forked to Conway era.

  • The nixosModule profile-blockperf in cardano-parts now includes prometheus metrics, automatically scraped with grafana-agent along with a dashboard.

  • A nixosModule profile-tcpdump in cardano-parts is now available to push on-going pcaps to s3 for historical reference.

  • Old dev environments were cleaned up and retired after the completion of the ouroboros-network-ops cluster migration to the cardano-parts stack.

  • Causes of blockperf indicated mainnet relay delayed block headers were investigated and improved with adjustments to RTS parameters and machine class.

  • Conway-era mempool log volume increase was investigated and resolved with ouroboros-network improvements.

  • Scaling capability was added to the cardano-mainnet bootstrap cluster.

Repository Work

Cardano Parts

  • Sets cardano-db-sync (release) to 13.4.0.0. Includes nixosModule improvements to cardano-db-sync snapshots module with a manual trigger, blockperf module new prom metrics, grafana-agent module with auto-blockperf scrape config and a new tcpdump module for persistent pcaps to s3. Recipe improvements for configuration consistency checking and openTofu improved AMI and DNS filtering have been made. The AWS machine reference spec has been updated and one alert tuned for better sensitivity. More detail is available in the PR description: cardano-parts-pull-46

Cardano-mainnet

  • Deploys cardano-db-sync (release) to 13.4.0.0. Deploys nixosModule improvements for cardano-db-sync snapshots module with a manual trigger, blockperf module with new prom metrics, grafana-agent module with auto-blockperf scrape config and a new tcpdump module for persistent pcaps to s3. Recipes improvements for configuration consistency checking and openTofu improved AMI and DNS filtering have been made. Makes changes to pool group relays to eliminate or reduce delayed block headers. Tests additional dev patches for missingBlock errors. Adds bootstrap cluster scaling capability and a bootstrap cluster dashboard. Improvements made in cardano-parts PR#46 are included in this PR. More detail is available in the PR description: cardano-mainnet-pull-20

Cardano-ops (Legacy Mainnet)

  • Over a two week period the legacy relay nodes were scaled down 50% further from the recent machine quantity peak. commit-compare

Cardano-playground

  • Preprod was hard-forked to Conway. Deploys cardano-db-sync to 13.4.0.0. Recipe improvements for configuration consistency checking and openTofu improved AMI and DNS filtering have been made. Improvements made in cardano-parts PR#46 are included in this PR. More detail is available in the PR description: cardano-playground-pull-30

Cardano-world

  • Updates openssh to 9.8p1 on remaining cardano-world (soon-to-be-retired) cluster machines commit

· One min read
Damian Nadales

High level summary

During the past week the team:

  • Incorporated minor improvements to the ChainSync client test (#529).
  • Documented tasks of a caught-up node (#1215), which can be useful for SPOs.
  • Tweag presented the last Genesis SoW to the Consensus team. The next steps are reviews and phased (opt-in) rollout.

Regarding the two problems found during UTXO-HD benchmarks, namely, increase in heap size (#1192) and a newly found race condition (#1193), #1208 fixed the race condition and was merged, however #1194 showed no improvements so it will not be merged yet.

· One min read
Noon van der Silk

High-level summary

Firstly, we had a succesful launch of the Hydra Doom project at RareEvo! Coinciding with this we updated our landing page, and released a minor version with a small, but important, bugfix. In the next period we will continue our focus on incremental commits, network testing, and general API compatibility.

What did the team achieve?

What's next?

  • Test more network resiliance scenarios #1575
  • Continued work on incremental commit #199
  • Switch ledger to Conway #1178
  • Investigate how to be compatible with cardanonical #1577

· 2 min read
Jean-Philippe Raynaud

High level overview

The Mithril team has completed their work on certifying Cardano's stake distribution. They implemented the client library, client CLI, and client NPM package. Additionally, they drafted a CIP for the diffusion of Mithril signatures through the Cardano network, which is available in a PR on the CIPs repository.

They also implemented a mechanism in the client to support evolutive configuration options and initiated a proof of concept for integrating signature diffusion with the Cardano network layer. Finally, they created a new runbook in the documentation and made progress on external contributions to the repository.

Low level overview

  • Created a draft PR for the Decentralized Message Queue CIP #876
  • Completed the issue CIP for Mithril signature diffusion through Cardano network #1775
  • Completed the issue Implement Cardano Stake Distribution in client library #1842
  • Completed the issue Implement Cardano Stake Distribution in client CLI #1880
  • Completed the issue Implement Cardano Stake Distribution in WASM client #1881
  • Completed the issue Update explorer for Cardano Stake Distribution #1843
  • Completed the issue Document Cardano Stake Distribution #1844
  • Worked on the issue Future proof options for mithril client #1878
  • Worked on the issue Mithril signature diffusion with Cardano network layer PoC #1837
  • Worked on the issue Test Cardano transaction chain rollbacks #1840
  • Worked on the issue Create repository dependencies upgrade runbook #1813

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for Node 9.1; UTxO-HD in-memory benchmarks; typed-protocols feature benchmarks.
  • Development: Correct resource trace emission for CPU 85% spans metric. Governance action benchmarking still under development.
  • Workbench: Preparations for bumping nixpkgs. Started removal of the container-based podman backend. Support GHC9.8 nix shells.
  • Infrastructure: Test and validate an upcoming change in node-to-node submission protocol.
  • Tracing: cardano-tracer: Support of non-systemd Linux was merged; safe restart of internal monitoring servers.

Low level overview

Benchmarking

We've run and analyzed a full set of release benchmarks for Node version 9.1. Comparing with the mainnet release 9.0, we could not observe any performance regression.

Additionally, we've performed feature benchmarks for an upcoming new API for typed-protocols. Those did not exhibit any regression either in comparison with the baseline using the current API.

Furthermore, we've performed various benchmarks for the UTxO-HD in-memory backend on Node versions 9.0 and 9.1. Based on those observations, a rare race condition could be eliminated, where block producers on occasion failed to fork off a thread for the forging loop. The overall network performance of the UTxO-HD in memory backend shows a slight improvement over the regular node, but currently comes with a slightly increased RAM usage.

Development

We've spotted an inconsistency in one of our benchmarking metrics - CPU 85% spans - which measures the average number of consecutive slots where CPU usage spikes to 85% or higher (however short the spike itself might be). There was a difference between legacy tracing system (which yielded the correct value) and the new one, for which a fix has already been devised.

The implementation of Conway governance action workloads for benchmarking is ongoing.

Workbench

With a nixpkgs bump on the horizon, we're working on adjusting, and testing, our usage of packages that change their status, lose their support, or packages that require pinning a version for the workbench.

Additionally, we'll remove a container-based backend for workbench, which ties in OCI image usage on podman with Nomad. It was a precursor to the current Nomad backend, which is containerless and can directly build Nomad jobs using nix.

Last not least, we've merged a small PR which enables our workbench to build nix shells with GHC9.8, as this not only pulls in the compiler, but much of the Haskell development toolchain. The correct version couplings between compiler and toolchain components is now declared explicitly from GHC8.10.7 up to GHC9.8.

Infrastructure

We've tested and validated an upcoming change in ouroboros-network which demands any node-to-node submission client to hold the connection for at least one minute before being able to submit transactions. The change works as expected and does not interfere with special functionality required by benchmarking.

Tracing

The trace consumer service for the new tracing system used to require systemd on Linux to build and operate. There are, however, Linux environments that choose to not use systemd. It is now possible to configure the desired flavour of that service, cardano-tracer, at build time, thus adding support for those Linuxes - cardano-node#5021.

cardano-tracer consumes not just traces, but also metrics. With the new tracing system, this shifts running a metrics server from the node to the consumer process. One possible setup in the new system is operating only one consumer service and connecting multiple nodes to it. In its current design, this requires to safely shutdown and restart the monitoring server, using the metrics store of any connected node that's been requested. We're currently battle-testing ekg's (the monitoring package that's being used) built-in behaviour and exploring solutions in case it does not fully meet requirements.