Skip to main content

· One min read
Kevin Hammond

Unexpected Ledger State Replay in the Conway era

An issue was identified shortly before the Chang hard fork: it was found that ledger state snapshots would break ledger replay in the Conway era under mainnet conditions. The ledger and consensus teams worked rapidly to resolve the issue with a hotfix released within 24 hours of the hard fork. In order to avoid pauses in node availability, it was recommended that users should not restart their node process until they had upgraded their node to the hotfix - this included any node type: relays, block producers, DB-Sync nodes, etc.

The issue is documented here. The cause was a slight inconsistency between the ledger state snapshots that were written and those that could be read back; a side effect of the removal of pointer addresses in the Conway era. Nodes version 9.1.1 and later resolve this issue.

Further Details

Node version 9.1.1

GitHub Issue

· 3 min read
Alexey Kuleshevich

High level summary

One specific Conway feature that received a lot of debate has finally been implemented, namely disallowing stake credentials from withdrawing rewards, that are backed by a key hash and have not delegated to any DRep. This feature will go into effect after the bootstrap phase.

A whole lot more tests were added and few minor bug fixes have been implemented as well. In particular GovInfoEvent ledger event has been fixed, which is unlikely to affect anyone, but it is worth pointing out. Ledger state deserialization bug was fixed that was necessary for the cardano-node-9.1.1 release. Some tech dept was taken care of, like cleanup of unnecessary predicate failures and fixing some flaky tests from Byron and Alonzo eras.

Low level summary

Features

  • pull-4218 - Remove maxMajorPV from Globals
  • pull-4589 - Fix deserialization of bad Ptrs in IncrementalStake
  • pull-4555 - Disallow withdrawals to non-delegated keyhashes post-bootstrap
  • pull-4600 - Stop reporting invalid refund when stake credential is not registered
  • pull-4604 - Fix enacted Set in GovInfoEvent
  • pull-4616 - Change ConwayWdrlNotDelegatedToDRep to wrap KeyHashes
  • pull-4609 - Removed DRepAlreadyRegisteredForStakeKeyDELEG

Testing

  • pull-4565 - ENACT conformance
  • pull-4541 - Fix failing tests in cardano-ledger-alonzo-test
  • pull-4585 - Fixes a property test "Ran out of tries on suchThatT"
  • pull-4543 - Increased the probability of generating the same hash more than once
  • pull-4574 - Byron: Force startTime in genesis data to be strict
  • pull-4596 - fix both reproduceable failures
  • pull-4586 - Byron: Fix failing ts_prop_elaboratedCertsValid test by moving mainnet-genesis.json to the appropriate path
  • pull-4584 - Sort Proposals when translating to SpecRep
  • pull-4546 - Ts additions prime spec cert steps
  • pull-4607 - Refactor debug tracing of QuickCheck discards
  • pull-4597 - DELEG Imp spec

Infrastructure and releasing

  • pull-4578 - Stop generation of haddock for internal modules
  • pull-4611 - Fix haddock: remove --show-all to test
  • pull-4569 - Fix fourmolu version for pre-commit shell
  • pull-4587 - docs: update README.md
  • pull-4591 - cardano-node-9.1 backport: Implement a fix for inability to deserialize pointers in Conway
  • pull-4590 - cardano-node-9.2 backport: Implement a fix for inability to deserialize pointers in Conway
  • pull-4593 - Plutus 1.33
  • pull-4614 - Changelog for cardano-node-9.2
  • pull-4608 - Remove dependency bounds on QuickCheck

· 2 min read
Jean-Philippe Raynaud

High level overview

The Mithril team continued working on decentralizing the signature orchestration of the Mithril network. In this preliminary phase, they kept implementing a buffer store for individual signatures that may arrive before being processed by an aggregator. They also worked on refactoring the state machine of the signer and addressed panics occurring in both the signer and aggregator during rollbacks of Cardano transactions. Additionally, they modified the pre-loading mechanism for importing Cardano transactions, ensuring it repeats indefinitely in the signer.

Finally, the team continued preparing the next distribution and investigated a breaking change introduced in a Hydra CI dependency.

Low level overview

  • Completed the issue Signer retrieves registrations with epoch settings route #1897
  • Completed the issue Make Cardano transactions preloading infinite in signer #1920
  • Completed the issue Seamless transition of features from unstable to stable in client WASM #1911
  • Worked on the issue Aggregator buffers signatures for unknown open message #1900
  • Worked on the issue Refactor state machine of the signer #1922
  • Worked on the issue Release 2437 distribution #1901
  • Worked on the issue Test Cardano transaction chain rollbacks #1840
  • Worked on the issue Panic on rollback on slot number not recorded in the Cardano transactions store #1929
  • Worked on the issue Breaking change in crane fails Hydra CI #1928

· 4 min read
Michael Karg

High level summary

  • Benchmarking: Release benchmarks for Node 9.1.1; additional UTxO-HD in-memory benchmarks.
  • Development: Created a local reproduction for observed UTxO-HD RAM increase.
  • Workbench: Created a new "age of Voltaire" performance baseline. Adjusted Nomad backend has entered testing phase.
  • Infrastructure: Dropping the requirement on Vault, optimizing cluster setup.
  • Tracing: New metrics naming schema was merged. Routing to internal monitoring servers is ongoing. Dropping dependency on HsOpenSSL.

Low level overview

Benchmarking

Runs and analyses for a full set of release benchmarks have been performed for Node version 9.1.1. In comparison with Mainnet releases 9.0 and 9.1.0, we could determine this version does not exhibit any performance regression.

Having been provided with the patch by Consensus targeting the increased RAM usage of the UTxO-HD in-memory backend (read below), we've performed additional benchmarks to validate the desired result on the cluster. Our measurements demonstrate the increased memory need has now vanished. We're confident that by now we've located - and addressed - all performance risks for UTxO-HD in-memory that we can capture given the instruments at our disposal. To gain further confidence in the stability of resource usage pattern and network metrics observed on the benchmarking cluster, we've advised long-running UTxO-HD nodes under close monitoring.

Development

We succeeded in creating a local reproduction of the increase in RAM usage that was observed for the UTxO-HD in-memory backend on the cluster. That reproduction enabled the Consensus team to inspect in real-time and profile running Node processes - which led to a swift identification of the underlying cause and a patch addressing it.

Workbench

After the smooth Chang hard fork which transitioned Cardano into the Conway era, we've created - and merged - a new performance baseline. It's intended for release benchmarks and caters to the new features of the Conway ledger. Apart from incorporating the latest protocol version and Plutus cost models, it includes DRep presence in ledger when performing measurements.

The PR preparing our workbench for a nixpkgs upgrade and removing the container-based Nomad / podman backend is complete and has entered testing phase.

Infrastructure

Currently, our Nomad cluster uses Vault to manage access and credentials for the benchmarking cluster. As the cluster exclusively relies on static routes, and fixed deployment endpoints, encoding access as a set of rules into the cloud infrastructure is a viable option. That way, we will no longer depend on the Vault service, removing the requirement of hosting, and maintaining, an instance of it.

Tracing

Aligning the metrics naming schema and semantics between new and legacy tracing systems has been completed and merged. This will enable a seamless interchange in the community, as all existing configurations of monitoring services remain their validity.

As for hosting multiple EKG metrics monitors in one single service application, we ascertained that the ekg package was not built for that use case. However, we've come up with a much nicer design for cardano-tracer using dynamic routing based on the names of nodes connected to it. It has successfully passed prototype stage in that it's able to serve multiple EKG monitors without the need for any server restart; the full implementation is being worked on.

Last not least, both existing tracing systems rely on the snap server framework, and thus by transitive dependency, on HsOpenSSL to speak the HTTPS protocol. However, we've determined the latter package to have a risk of breaking the build, both currently and in the future (cf. HsOpenSSL#95 and HsOpenSSL#88). As a consequence, a switch to the wai / warp based framework was decided, which implements HTTPS capability differently, thus preempting the risk. This has already been carried out for the legacy system, and currently is for cardano-tracer - a big shoutout to Erik de Castro Lopo for his support on that issue.

· 2 min read
Kevin Hammond

Blocks from the future

We identified two issues relating to "blocks from the future".

  1. Blocks from the near future
  2. Blocks from the far future

While blocks from the near future have been known to occur on mainnet as a result of clock skew/misconfiguration, there are no known instances on mainnet of blocks from the far future. In both cases, restarting an affected node would resolve the issue.

What is meant by a Block from the Future?

A node considers a block to be from the future if its slot is ahead of the current slot. Ouroboros Praos mandates that all chains containing blocks from the future (at that time) are ignored during chain selection. As Praos assumes that all nodes have access to perfectly synchronized clocks, this will never cause nodes to disregard blocks that have been minted by other honest nodes. In an actual real-world deployment, this assumption is unrealistic due to the imperfections of protocols like NTP as well as leap seconds.

The issues that were identified meant that blocks from the future could potentially be used by malicious actors to create denial-of-service attacks.

Both issues were fixed by Cardano node 8.8 or later, and were eradicated at the Chang hard fork.