Skip to main content

· 2 min read
Jean-Philippe Raynaud

High level overview

This week, the Mithril team prepared the pre-release for the 2543.0-pre distribution. This version introduces support for the default incremental backend (v2) for Cardano database restoration, enhanced integrity verification that reports any tampered or missing files in case of failure, and various bug fixes and improvements.

The team also completed the integration of the Haskell DMQ node into the end-to-end tests to enable decentralized signature diffusion. They implemented a simple aggregator discovery mechanism and continued work on the first phase of decentralizing configuration parameters. Additionally, they advanced the design of certificate snarkification.

Finally, they adapted the project to the latest NPM security changes for publishing packages and refactored the aggregator's HTTP client.

Low level overview

Features

  • Pre-released the new distribution 2543.0-pre
  • Completed the issue Integrate the Haskell DMQ node in the e2e test #2674
  • Worked on the issue Decentralization of configuration parameters - Phase 1 #2692
  • Worked on the issue Implement a simple aggregator discovery mechanism #2726
  • Worked on the issue Release 2543 distribution #2727
  • Worked on the design of the snarkification of the certificates

Protocol maintenance

  • Worked on the issue Implement a common aggregator client - Phase 1 #2640
  • Worked on the issue Enhance protocol security page on website #2703
  • Worked on the issue Support NPM security changes with trusted publisher tokens #2745

· 5 min read
Michael Karg

High level summary

  • Benchmarking: Various maintenance to prepare for upcoming Node 10.6 changes; metrics migration guide for SPOs.
  • Development: Prototyping a PromQL-based alert system for new tracing.
  • Infrastructure: Located a space leak that was interfering with on-disk benchmarks.
  • Tracing: Equipping dmq-node with the new tracing system; cardano-tracer library and API ongoing.
  • Leios: Impact analysis and tech design; preparation for simulations hand-off.
  • Hydra: Kick-off for development of system integration level benchmarks.
  • Node diversity: Trace semantics specification ongoing; example test case ready for merging.

Low level overview

Benchmarking

We've performed various maintenance tasks to accomodate our automation to several breaking changes that come with Node 10.6. There are new constraints on Plutus cost models in genesis files which make it more difficult to set up a customized testnet, like our benchmarking cluster. Furthermore, we've been assisting with integrating and debugging the upcoming release's components, such as tx validation in the cardano-api, or the implementation of new trace messages the Node emits.

As the new tracing system will be the default on Node 10.6, we've created a tool which, given a side-by-side listing of metrics names from both systems, will automatically generate an exhaustive migration guide for SPOs. As the metrics names have changed slightly, a migration for existing monitoring setups is required as a one-off effort. The migration guide will be published on the Cardano Developer Portal.

Whilst the legacy tracing system is still fully operational in Node 10.6, the release marks the begin of its deprecation period - giving SPOs sufficient time to adjust their setups.

Development

While cardano-tracer logs forwarded trace messages according to detailed configs, and exposes forwarded metrics, it does not provide any built-in functionality for monitoring and alerts. The experimental RTView component, which still can be built for cardano-tracer and remains fully functional, was an attempt to provide dashboards and alerts out of the box. However, due to its restricted design and low interoperability with existing monitoring standards, it has been discontinued.

Currently, we're taking another stab at this: We're building a prototype that creates timeseries directly from observed metrics, and is able to parse and evaluate PromQL queries referring to them. Based on that prototype, we'll assess resource usage and feasibility of fully building that feature into cardano-tracer. As most monitoring alerts can be (and are) defined as conditions on PromQL query results, and PromQL is an established industry standard, we see a low barrier for adaptation. Furthermore, if built sufficiently modular, it would eliminate the need to operate additional 3rd party services for scraping metrics and monitoring for alert conditions - at least in some usage scenarios.

Infrastructure

With help and support from the Consensus team (Gracias, Javier!), we were able to locate a space leak that affected on-disk benchmarks. While the current on-disk benchmarks are representative, the space leak prevented us from scaling memory limits for the Node process with finer granularity. This will get merged post 10.6 release, and will be of much use when we do comparative benchmarks of the LMDB and the new lsm-trees on-disk backing stores.

Tracing

We've been working on equipping the Network team's new dmq-node service with our new tracing system, trace-dispatcher - still a work in progress. Currently, the dmq-node uses plain contravariant tracing. Having a configurable system, with an API abstraction to define metadata and message serializations as well as process metrics, is a necessary step towards production-ready tracing. The added benefit of using trace-dispatcher is reusability of definitions already implemented in cardano-node, and a uniform way of how cardano-node and dmq-node are configured, as well as how the expose trace evidence and metrics.

The work on cardano-tracer as a library, with principled API and intra-process communications, is ongoing. Implementation is almost complete, and will enter testing phase soon.

Leios

We've contributed to Leios Impact Analysis. The Performance & Tracing section summarizes how implementing, and eventually benchmarking Leios will impact various aspects of our approach. This spans from adding new Leios-specific observables into component code, to deriving submission workloads suitable for Leios, to finding a scaling approach to be able to correlate performance observations to exact changes in Leios protocol parameters.

Additionally, we're currently working on the Leios technical design and implementation plan, which lays out our approach and derives some very high-level steps on how to realize it, based on the impact analysis.

Hydra

We've kicked off a collaboration with the Hydra team. The goal is to build system integration level benchmarks for Hydra, which can target system pressure points, and which are able to scale operations in various directions - much akin to what we're doing for the Cardano node. Eventually, those benchmarks should provide more than just immediate feedback for Hydra engineers; they should be able to yield realistic figures of what to expect from operating one (or more) Hydra heads, and what the resource requirements are to achieve that. Currently, we're familiarizing ourselves with the system and its built-in observability to assess if it meets the requirements for the benchmarks we have in mind.

Node diversity

The work on (multi-)node conformance testing is ongoing. We're in the process of creating a specification document for semantics of existing Node traces. While a few of them might be unique to the Haskell implementation, the majority documents how Cardano itself runs and behaves; those traces can implemented accordingly across diverse Node projects.

Our own demonstration test case for conformance testing is fully implemented and ready to be merged after the Node 10.6 release. It validates a metric exposed by a node wrt. trace evidence in its logs and internal events the metric is based on; see cardano-node PR#6322.

· 2 min read
Marcin Szamotulski

Overview of sprint 98 and 99

Cardano-Node 10.6 Release

We swiftly identified and resolved an issue in the Ouroboros.Network.Server.Simple.with function. This bug broke cardano-tracer component, [on#5224], [on#5223]. The function hasn't been used in cardano-node before. ouroboros-network-framework-0.19.2.0 was released ([chap#1157]).

We'd like to point out that cardano-node-10.6 will come supporting only P2P network stack.

For this coming release, we addressed some corner cases in topology parsing, cn#6304.

DMQ-Node

We continued working on dmq-node. Recently, an end-to-end test was successfully run using the mithril Rust client, submitting signatures to a dmq-node, which propagated them to another instance of dmq-node. on#5203

We found out and fixed a bug in typed-protocol's annotated driver, which is used by dmq-node: on#5207.

A static build of dmq-node is now also available for the x86_64-linux-musl target:

nix build .#dmq-node-static

The trace & monitoring team is helping us to integrate dmq-node with the cardano-tracer library. The aim is to give SPO the familiar user experience when monitoring dmq-node.

Peer Selection Improvements

A number of discrepancies were found and fixed in the peer selection logic. The peer-selection and net-sim tests were improved: on#5209, on#5232

WASM support in Ouroboros-Network

The node team contributed a partial WASM support to ouroboros-network, on#5229.

Technical Debt Reduction

Ouroboros-Network

We reorganised the ouroboros-network package structure to improve maintainability and simplify our release process, on#5200.

  • cardano-diffusion: everything related to diffusion for Cardano purposes, in particular cardano-node, but not only.
  • ouroboros-network-api, ouroboros-network-framework, ouroboros-network-protocols packages are now sublibraries of ouroboros-network. Some Cardano-specific APIs are only present in cardano-diffusion.
  • cardano-client package is now just a sublibrary: cardano-diffusion:subscription.
  • cardano-ping will become a sublibrary: cardano-diffusion:ping once on#5205 is merged.

Consensus

We addressed some outstanding TODOs in the ouroboros-consensus-diffusion package, see oc#1660.

· 3 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • IOE participated in a community driven Sanchonet network chain disaster recovery test event to purposely break Sanchonet and recover by mechanisms and tools discussed in CIP-0135 including db-truncater and db-synthesizer.

  • Faster test deployment iteration of a 10.6.0 pre-release candidate is underway to some preview and preprod testnet machine deployments with a tight feedback loop between SRE and dev team for observations and debugging when issues are found.

  • A new cardano-tracer OCI container will be provided with the upcoming 10.6.0 pre-release now that the new tracing system will be default.

  • SRE team facilitated the passage of a governance action on the preview network, soon to be submitted to preprod and likely followed by a mainnet proposal and full community vote.

  • SRE team has begun providing some additional support to the Midnight Scavenger Mine project as needed until the Scavenger Mine phase completes.

Repository Work -- Merged

Cardano-node

  • This PR includes various SRE related changes for 10.6.0 pre-release readiness:

    • Bumps iohk-nix for config updates from iohk-nix PR#602
    • Fixes configuration change related CI checks.
    • Merges bp and non-bp configurations into a single config whereby ouroboros-network now automatically determines PeerSharing and Target* parameters which previously required being explicitly declared.
    • The new tracing system is now set as the default configuration; the legacy tracing system config is still made available.
    • The mainnet default topology configuration now includes a peerSnapshot declaration for making testing of GenesisMode more convenient.
    • Adjusts OCI containers for the new config setups and also includes peer-sharing configs for each network
    • Updates the nixos cardano-node service for the deprecation of useNewTopology given P2P is now the only networking mode as of 10.6.0.
    • Updates the nixos cardano-node service for new tracerSocketNetworkAccept and tracerSocketNetworkConnect cardano-tracer connection options.
    • Updates the nixos cardano-node service to support SRV peer records.
    • Updates the nixos cardano-tracer service for option name changes of acceptingSocket to acceptAt and connectingToSocket to connectTo; related workbench services were also updated accordingly.
    • For the binary releases, the cardano-submit-api config and peer-sharing config was added.
    • The default cardano-submit-api config was made compatible with the new tracing system.

    cardano-node-pr-6300

Repository Work In Progress -- PRs and Branches

· 2 min read
Jean-Philippe Raynaud

High level overview

This week, the Mithril team continued to implement the first phase of decentralizing the configuration parameters. They also completed enhancements to the client library and CLI, providing access to Cardano database incremental snapshots by epoch. Additionally, they kept working on the design of the snarkification of the certificates.

Finally, the team added a section about compatibility with the Cardano node in the GitHub release notes and worked on adapting to the security changes for NPM package publication.

Low level overview

Features

  • Completed the issue Provide Cardano database incremental snapshots needed for Amaru bootstrap #2704
  • Worked on the issue Decentralization of configuration parameters - Phase 1 #2692
  • Worked on the issue Integrate the Haskell DMQ node in the e2e test #2674
  • Worked on the issue Release 2543 distribution #2727
  • Worked on the design of the snarkification of the certificates

Protocol maintenance

  • Completed the issue Cardano node compatibility in GitHub release notes #2743
  • Worked on the issue Implement a common aggregator client - Phase 1 #2640
  • Worked on the issue Enhance protocol security page on website #2703
  • Worked on the issue Support NPM security changes with trusted publisher tokens #2745