Skip to main content

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-node 8.9.1 is now deployed to all environments.

  • The legacy IOG mainnet metadata server has been retired with CF now providing metadata server services going forward.

  • Cardano-parts PR#35, merged and linked below, offers ip information integration into nixosConfiguration modules as well as template-diff and template-patch recipes for easier upgrades going forward.

Lower level summary

Capkgs

  • Adds a shortRev suffix to package names, fixes an rclone recipe, fixes a CI push action, defaults to recursively dereferenced object hashes, cleans up reference patterns. See the PR description for more details: capkgs-pull-2

Cardano-mainnet

  • Bumps to cardano-node 8.9.1 and deploys all machines, makes ip information available in nixosCfgs, adds new expected machine alerts, tunes snapshot alerts and implements all updates in cardano-parts PR#35. See the PR description for more details: cardano-mainnet-pull-10

Cardano-ops

Cardano-parts

  • Upgrades cardano-node to 8.9.1 for both release and pre-release, integrates machine ip information into nixosConfigurations, enables /etc/hosts file usage in cardano-node topology, enhances cardano-node topology producer generation with customizable address types, introduces template patching recipes for easier cardano-parts updates to existing clusters. Much more detail is available in the PR description: cardano-parts-pull-35

Cardano-playground

  • Bumps to cardano-node 8.9.1 and deploys all envs, rotates KES keys in most envs, makes ip information available in nixosCfgs and implements all updates in cardano-parts PR#35. See the PR description for more details: cardano-playground-pull-18

· 2 min read
Sebastian Nagel

High-level summary

This week, the Hydra team conducted the monthly review meeting and investigated a broken head situation. The team slightly improved conway forward compatibility in explorer / hydra-node, enhanced hydra-cluster --devnet which allows e2e testing of kupo, extended smoke test to also include committing ADA into the head, documented the anticipated behavior of incremental decommits, and added decommits to the tutorial.

What did the team achieve this week

  • Conducted the monthly review meeting (link to recording already?)
  • Investigated a broken head situation #1374
  • Slightly improved conway forward compatibility in explorer / hydra-node #1373
  • Busy hydra-cluster --devnet sandbox which allows e2e testing of kupo #1378
  • Extended smoke test to also include committing ADA into the head #1377
  • Documented the anticipated behavior of incremental decommits and added decommits to tutorial
  • Another write-up of how the incremental commit/decommit could work (without needing merkle trees or L2/L1 interleaving) on this issue

What are the goals of next week

  • Complete the written monthly report
  • Update our head and hydraw instance to master (a release candidate)
  • Complete the improved /commit endpoint to unblock users
  • Release 0.16.0 (likely without incremental decommits)
  • Reproduce close > contest > contest scenarios using stateful testing

· One min read
Alexey Kuleshevich

High level summary

We continued focusing on adding tests and improving the test frameworks, including the quality of the generated data used in tests.

Low level summary

Conway

  • pull-4205 - Disable CC ratification when number of members is below ppCommitteeMinSize
  • pull-4169 - Add GovInfoEvent and add event testing capabilities to ImpTest
  • pull-4208 - Remove missingScriptsSymmetricDifference

Testing

  • pull-4121 - Newconstraints phase3, Add newtypes: Size, SizeSpec and class Sized.
  • pull-4197 - add unsafeMkProposals to be used for testing
  • pull-4200 - Fix prop_GOV so that it runs again
  • pull-4216 - improve the GOV generator to generate more interesting signals

Improvements

Releasing

· 2 min read
Jean-Philippe Raynaud

High level overview

This week, the Mithril team released a new Mithril distribution 2412.0. This release includes several critical updates and enhancements, such as support for the Prometheus metrics endpoint in signer, deprecation of the snapshot command in the client CLI, full Pallas-based implementation of the chain observer, and support for Cardano node v.8.9.0.

The team continued implementing the certification of Cardano transactions in Mithril networks. They focused on scaling the signature and proof generation for mainnet, kept implementing a more versatile beaconing mechanism, reducing the latency of transactions signature, and continued investigating a bug in the block parser that prevents some Conway transactions from being signed. Additionally, they started working on a prototype to decentralize signer registration with the relay and a peer-to-peer (P2P) network.

Finally, the team completed the implementation of some community-requested features to verify the output folder structure made by the client, and kept investigating a source of flakiness in the CI end-to-end test.

Low level overview

  • Released the new distribution 2412.0
  • Publication of a dev blog post about the Mithril signer Prometheus endpoint release
  • Publication of a dev blog post about the Mithril client CLI snapshot command deprecation
  • Completed the issue Implement a Block Range Merkle Tree for Cardano Transactions #1533
  • Completed the issue Do not require the mithril client to create the DB directory #1572
  • Worked on the issue Support multiple beacon types in signer/aggregator #1562
  • Worked on the issue Mithril relay broadcasts signer registrations with P2P PubSub #1587
  • Worked on the issue Provide fake aggregator data in an aggregated form #1594
  • Worked on the issue Some transactions are not signed in testing-sanchonet #1577
  • Worked on the issue End to end tests are flaky in CI #1558

· 3 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for 8.9.1 have been performed and analysed.
  • Development: We've implemented a benchmarking setup for UTxO-HD's LMDB (on-disk) backend.
  • Workbench: The now modular, nix-based genesis creation has been merged to master; DRep delegation and integration of a new cardano-cli command are ongoing.
  • Tracing: Benchmarking the new handle registry feature in cardano-tracer is complete; quality-of-life improvements to Prometheus output.
  • UTxO Growth: We've adjusted our framework to support running UTxO scaling benchmarks on both a single node and a cluster.
  • Nomad cluster: new multi-cluster support with the capability to quickly adjust to changes in deployed hardware

Low level overview

Benchmarking

We've performed a full set of release benchmarks for Node 8.9.1. Comparing with release 8.9.0, we could not detect any performance risks for that version.

Development

In context of UTxO scaling, we want to assess the feasability of the current on-disk solution (which is LMDB) of a UTxO-HD enabled node. Using that, the UTxO set will be kept in live tables and snapshots on disk, significantly reducing memory requirements.

We've implemented a benchmark setting, and a node service configuration, supporting direct disk access to a dedicated device which can be initialized with optimized file system and mount settings. It's purpose is to serve as storage for the highly performance-critical disk I/O of the LMDB component.

Workbench

Our automation for creating all flavours of geneses has seen cleanup and refactoring - which has been merged to master. It can now use a more principled, and rigorously checked, modular approach to define, create and cache the desired genesis files.

Working on integrating new cardano-cli functionality in our automation is ongoing. The performance workbench will support a different, and updated, CLI command which will allow injection of DRep delegations into genesis.

Tracing

Benchmarking cardano-tracer's new handle registry feature has been performed and evaluated. We're satisfied with seeing clear performance improvements along with cleaner code, and much better test coverage. Especially allocation rate and number of garbage collections (GC) could be significantly reduced, along with the CPU time required for performing GCs. This will allow for higher trace message throughput given identiacal system resources - plus less system calls issued to the OS in the process.

Furthermore, the new tracing system is getting improvements for its Prometheus output - like providing version numbers as metrics, or annotating metrics with their type - enhancing the output's overall utility.

UTxO Growth

The performance workbench now supports profiles aimed at simulating UTxO growth both for a single node and an entire cluster. Additionally, simulating different RAM sizes in combination with specific UTxO set sizes is supported. For a single block producing node, the focus is on quick turnaround when running a benchmark, gaining insight into the node's RAM usage and possible impact on the forging loop.

The cluster profiles enable capturing block diffusion metrics as well, however they require a much longer runtime. We can now successfully benchmark the node's behaviour when dealing with UTxO set sizes 4x - 5x of current mainnet, as well as a possible change in behaviour when operating close to phsyical RAM limit due to that.

Nomad cluster

Our backend now supports allocating and deploying Nomad jobs for multiple clusters simultaneously - all while keeping existing automations operational. We've taken special precautions a cluster, as seen by the backend, can be efficiently and easily modified to reflect newly deployed, or changed, hardware. Additionally, we've added support for host volumes inside a Nomad allocation - which will be needed for benchmarking UTxO-HD's on-disk solution.