Skip to main content

· 2 min read
Carlos LopezDeLara

2023-09-13 - 2023-09-26

High level summary

  • cardano-node 8.4.0-pre release suitable for SanchoNet.
  • CLI continues making progress integrating governance features. During this sprint we integrated the info and new-committee governance actions.
  • The team continued moving to the ERA top-level commands structure. Removed --conway-era flag from the legacy commands making conway era commands only accessible via cardano-cli conway.
  • stake-pool command is now under the ERA top level structure.
  • API continues integration with governance features, it is worth to higlight that now ProposeNewCommitee uses the right key type (cc-cold)

cardano-cli

cardano-api

cardano-node

cardano-testnet

docs

CI & project maintenance

· 2 min read
Sebastian Nagel

High-level summary

This week, the Hydra team conducted the monthly review meeting in collaboration with Mithril, enhancing project coordination.

The team improved the gen-hydra-key node command for smoother usability and identified concrete steps to enhance network resiliency in feature items #188, #1080, and #1079. Additionally, they contributed the aiken-mode editor integration to the aiken-lang organization, updated dependencies to utilize cardano-api 8.20, and published the Hydra security advisory CVE-2023-42806 with a workaround available for users.

These efforts demonstrate the team\'s commitment to project improvement, security, and open-source community collaboration.

What did the team achieve this week

  • Conducted the monthly review meeting together with Mithril
  • Improved gen-hydra-key node command #1077
  • Established a clear plan to improve resiliency of network and manifested feature items #188, #1080 and #1079
  • Moved aiken-mode (created by SN) to aiken-lang organization
  • Updated dependencies to using cardano-api 8.20 #1075
  • Published security advisory CVE-2023-42806 (workaround available)

What are the goals of next week

  • Write-up the monthly report for September
  • Finish "network resilience to disconnects" #188
  • Finish kupo integration with hydra #1078
  • Discuss and decide on using aiken or not
  • Address the published security advisory CVE-2023-42806 (to not require workaround)
  • Ideally, release 0.13.0

· 3 min read
Michael Karg

High level summary

  • Benchmarking: We've performed both low-level network and high-level variance analysis of our benchmarking clusters.
  • Infrastructure: Our reporting pipeline was adjusted to classify various workloads easily reducing rework time.
  • Tracing: Work on machine-readable tracing of tracer configuration is ongoing.
  • Nomad backend: We've been able to eliminate several possible confounders on the nomad cluster.
  • Team: We're currently onboarding a new team member: Welcome to Cardano Performance & Tracing, Baldur Blöndal!

Low level overview

Benchmarking

As part of the effort to bring the Nomad backend into production use, we've been equipping both that and the existing benchmarking backend with means to measure and document network latency for each run. Furthermore we've implemented means to capture TCP packets for a limited time window during a benchmarking run - which will allow us to spot differences in the behaviour of the underlying networking stack at OS level.

Additionally, we're running variance analysis in parallel on both backends to ascertain confidence in metrics originating from either. We've concluded that baseline profile runs aren't directly comparable between the two, so we decided to compare standard deviations instead to validate the measurements from nomad.

Infrastructure

Reporting on benchmarks does require human time and effort to rework the final document. Improvements to the reporting pipeline have been merged to master. They reduce the time necessary to do so by various changes to the template and the workload classification logic in analysis.

Beyond that, we've looked into issues where services would quit with an unjustified exit failure upon shutdown - under rare circumstances. By reworking shutdown logic for trace-dispatcher and tx-generator we were able to address those issues.

Tracing

After various steps in constructing a configuration upon node startup, it is vital to document which runtime configuration the node arrived eventually. We're working on providing a machine-readable JSON/YAML trace message for that purpose.

This will facilitate hot-reloading a node's tracer configuration in the future: users will be able to take such a trace message, apply their intended change and hot-reload it immediately into the node.

Nomad backend

As with the existing benchmarking cluster, nomad is currently under scrutiny with regard to the reliability of metrics it produces, as well as the behaviour of its OS-level network stack. For instance, differing kernel versions can have an impact on our measurements, as we'd be basically using two different instruments to take them.

Along the way we've already been successful in eliminating some possible confounders that had been introduced by the nomad service or the slightly different system architecture of the new cluster.

New team member

Baldur Blöndal is an extremely capable and experienced Haskell developer. Also, he's an excellent fit for our existing team. So I'm very pleased to welcome him onboard with IOG, and with Performance & Tracing. He will be working on cardano-tracer, the component receiving, processing and making available node traces and metrics.

· 2 min read
Damian Nadales

High level summary

We have a proposed fix for the mempool forging regression observed in the UTxO-HD branch. We need to confirm this by running system level benchmarks. We are still working on a fall back mechanism for keeping the baseline performance of Cardano node, if the performance of the UTxO-HD is not enough. On the Genesis front, we confirmed with the researchers that the proposed Genesis design is satisfactory for the historical Cardano chain. We also have a proposed fix for the wrong protocol version bug, found in the Sanchonet, after transitioning to Conway.

UTxO-HD

  • We optimized the mempool revalidation process, which in turn ought to solve the regression observed during system-level benchmarks in the in-memory version (349). System level benchmark results are pending.
  • Regarding the workaround to keep the node's baseline performance if that of the in-memory backend turns out not to be enough for our stakeholders (344), we are still expanding the legacy block package such that we could at some point run the node with a legacy Cardano block. There are some loose ends to wrap up before we can begin the first test run.
  • We also brought the UTxO-HD branch up to date with node version 8.4.0.

Genesis

  • We finished the discussion with the Researchers on how to argue that the proposed Genesis design is satisfactory for the existing historical Cardano chain. We are now drafting the final self-contained argument. (4157)

Support

  • We debugged a bad parameter update on the Babbage to Conway transition in the SanchoNet testnet (339). A superficial patch is within reach and we are in the process of reviewing the PRs related to this fix (340, 354, and 355) However we are investigating a more principled redesign of the epoch transition logic, which required us to revisit the existing interfaces of the ConsensusProtocol type class and the HardForkBlock combinator (345 and 346). This is important to prevent these kind of errors in the future. This is an overdue step in the process of taking full ownership of the HFC: reconsidering original HFC design decisions for which we now have much more context, a few years later.

· One min read
Jean-Philippe Raynaud

High level overview

This week, the Mithril team has completed the refactoring of the terraform deployment workflows in GitHub actions, and the implementation of snapshot compression parameters in the deployments. They kept working on the refactoring and standardization of the errors in the Mithril nodes. The team also completed the implementation of Cloudflare protection for the aggregator infrastructure and started working on its deployment and activation in the Mithril networks. Additionally, they worked on recording download statistics on the aggregator which will be used to produce usage reports.

Finally, they kept working on the aggregator performance bottleneck that occurs with high client traffic and started creating a new distribution.

Low level overview

  • Completed the issue Add snapshot compression parameters in infrastructure deployments #1200
  • Completed the issue Add Cloudflare protection of infrastructure #986
  • Worked on the issue Record statistics about the downloaded snapshot in the aggregator #1127
  • Worked on the issue Error refactoring #798
  • Worked on the issue Activate Cloudflare protection of infrastructure #1230
  • Worked on the issue Release new 2337 distribution #1219
  • Completed the issue Upgrade dependencies #1238