Skip to main content

· One min read
Damian Nadales

High level summary

This week the consensus team continued working on the improved DB lock mechanism for UTxO-HD, and modifications to the mempool benchmarks that this prototype requires.

On the Genesis front we validated that the fragment size calculation in BlockFetch is a major performance sink for ChainSync Jumping. By removing it we will get performance that is acceptably close to that of the baseline. We also started investigating a performance fix that does not alter the existing baseline behavior too much. In addition we reviewed our Genesis attack vector calculations.

On the support front we released Consensus 0.4, and we are working on improving our release process, to support the Cardano-wide efforts in this area. We also performed an analysis on the number of file descriptors that consensus use. This information can be used by the node operators to check if the number of file descriptors they want to support are enough.

· 3 min read
Michael Karg
  • Benchmarking: We performed a series of benchmarks aimed at the new 8.0 release branch and built a timeline from the 1.35 releases to that branch.
  • New tracing: Work on safeguarding the new tracing system performance-wise is ongoing. A practical use case for data points is being tackled with Galois.
  • Analysis pipeline: We're working on automatically obtaining a detailed manifest for each run.
  • Infrastructure: The library for benchmarking Plutus scripts has been merged. Also, we've laid the ground for including GHC profiling data into our workbench.
  • Nomad backend: The first iteration of a distributed / multi-client Nomad cluster has been merged.

Benchmarking

We have performed various cluster runs targeting the 8.0 release branch. That way we were able to catch an inconsistency in behaviour early on. This led to the creation of a specialized workbench profile epoch-transition for local reproduction of what we observed on the benchmarking cluster.

Furthermore, we bridged the gap between the run data from the 1.35.x releases to the the new 8.0.x release branch. This included walking the master branch backwards and pinpointing the order, as well as the dates and commits of all relevant component bumps. This timeline is absolutely crucial in locating possible regressions for the new release branch, as it provides the exact points in history we would need to target with a comprehensive set of benchmarks.

Tracing

In-depth performance analysis of the new tracing system has already yielded results and helped us smoothing some rough edges. However, this work is still ongoing.

In coordination with Galois, who are developing a system assurance service by observing a number of cardano-nodes, we're working with the implementation of data points which the node provides during runtime. While making the view on data points expressive enough for the external service, the computational burden inside the node needs to be kept to an absolute minimum. We're currently in ideation about whether cardano-tracer could be extended with a richer feature set to that end.

Infrastructure & Analysis

Detailed manifest

A run manifest documents, among other things, the component dependencies that were used for a specific build the run has been performed with. These dependencies come from different package sources, have different versioning policies, and an identical package version might provide different performance characteristics depending on the exact commit used for the build. This manifest will greatly increase insight into where changes in measured behaviour might have originated by making all component bumps visible and accessible.

GHC profiling inside workbench

The workbench has been equipped with a new -profnix profile flavour. This enforces a -fprof-auto build for all node-related packages. The type of profiling data generated by the GHC runtime can be customized and will enter statistical analysis. The relevant PR for this new feature has already been merged to master.

Nomad backend

The added feature for a multi-client Nomad cluster greatly enhances how jobs are organized by the backend and mapped within specific instances. This results in great maintainability while not giving up on flexibility. However, work on that feature is still ongoing.

· 2 min read
Kevin Hammond

Incident reporting: Cardano block production temporary outage

On Sunday, January 22, 2023, an incident occurred resulting in block production pausing for a brief period of time (approximately two minutes, similar to the usual pause at an epoch boundary). Around 50% of block-producing nodes and relays restarted during this period. Having restarted, nodes continued to produce blocks without failure. While the network continued to operate, the issue did have the potential to affect network integrity, so was flagged as a ‘critical’ incident, thus warranting immediate response and investigation by IOG engineers. The investigation (with SPO & Cardano Foundation collaboration) quickly revealed the cause of the issue – a complex bug in data structure handling code related to the precise order of insertion/deletion of multi-asset tokens into the internal ledger record. Input Output Global (IOG) engineers, along with SPOs and DApp developers, collectively identified how to reproduce the issue as a unit test that could be included in the standard Cardano node test suite. Following successful testing, this led to a bug fix being implemented, tested, benchmarked, and deployed as a hotfix in the node v.1.35.5 release on Friday, January 27, 2023. Care was taken not to highlight the exact cause of the bug during this process so that it could not be exploited prior to SPOs deploying this new node version. With the fix deployed, the Cardano SPO and developer community have not seen any further instances of this issue.

Further Details

You can read more details on the incident and how it was managed from SundaeSwap’s Pi Lanningham here. Thanks again to all the community for its support in identifying and fixing this bug.

· One min read
James Chapman

The team works on applied research and consulting in formal methods that is directly applicable to evidence based engineering in Core Tech and beyond.

High level summary

This sprint the team has submitted two papers for publication, carried out consultancy with other teams and has an opening for an intern.

Details

· One min read
Sebastian Nagel

High-level summary

This week, the Hydra team focused on improving the smoke test, fixing developer tooling, and improving the API for voting use cases. They reviewed progress on auction, payments, and voting projects and made worked on reproducing a bug with handling rollbacks. Moving forward, the team plans to update dependencies, implement a dirt road fix for the rollbacks bug, and explore adding Hydra support to kupo.

What did the team achieve this week

  • Reviewed progress on auction, payments and voting projects
  • Improved smoke tests so they can run on mainnet
  • Fixed a regression in the development environment and updated cardano-node used in tests
  • Improved API with more configurability to unblock voting use case
    • Exclude utxo in SnapshotConfirmed outputs #808
    • Addressed a user request by only sending Greetings once #813
  • Reproduced the rollback bug by improving our model-based test suite #784

What are the goals of next week

  • Update dependencies to match cardano-node master
  • Dirt road fix for rollbacks #784
  • Update Hydraw to maintain state locally
  • Explore adding Hydra support to kupo
  • Put disclaimer texts and closing mainnet compatibility feature #713