Skip to main content

· One min read
John Lotoski

High level summary

The SRE team continues work on CI and cardano environment improvements. Some notable recent improvements include:

  • A devx-ci cluster containing a Hydra build server and Linux build farm was stood up and is intended to replace Cicero functionality
  • Cardano Sanchonet environment was stood up to test Conway era functionality
  • Cardano-node nixos service was updated to support the latest p2p topology format and non-systemd socket activation use case
  • Cardano-node 8.1.1 was deployed to preview, preprod and mainnet environments

Lower level summary

Cardano-node

  • Update cardano-node nixos service for updated p2p topology and non-systemd socket activation: cardano-node-pull-5318

Cardano-ops

Cardano-world

Ci-ops

Ci-world

  • Add devx-ci cluster integration for caching, monitoring during cicero to devx-ci transition: ci-world-compare

Devx-ci

  • A CI cluster with a hydra build server was stood up and is intended to replace usage of Cicero: devx-ci-repo

Iohk-nix

· One min read
Franco Testagrossa

High-level summary

This week, the Hydra team focused their efforts on continuing to investigate and experiment operating a head on mainnet. They collected several bugs and issues and worked on fixing them. Now the team is soon to release a new version, 0.11.0, which comes with a lot of improvements and bug fixes.

What did the team achieve this week

  • Restored and fixed a bug which stalled our head on mainnet #927
  • Solved one user issue #914
  • Reduced significantly local state size and logs by removing the full scripts from it #928
  • (pending review) Reduced snapshot size in the API, by only including tx ids #922

What are the goals of next week

  • New release 0.11.0
  • Monthly report & review meeting.
  • Fix some minor bugs discovered when operating our head on mainnet
  • Complete journey for external commits using multiple script UTxOs #903
  • Publish benchmarks and provide regular benchmarks for Hydra #186

· 2 min read
Jean-Philippe Raynaud

High level overview

The Mithril team completed the implementation of the new sub-command for restoring a Mithril stake distribution in the client. They also updated the client’s developer documentation and architecture documentation, and did some refactoring on the client and its dependency injection mechanism. Additionally, they completed and deployed infrastructure enhancements on the test Mithril networks. They also completed the performance tests of the new stake distribution computation on the Cardano mainnet.

Finally, the team worked on fixing a bug on the client multi-platform test, a bug on the aggregator state machine, and some flakiness on the CI.

Low level overview

  • Completed on the epic that designs and implements generic signing/verification of entity services #780:
    • Completed the issue Create the sub-command for 'Mithril Stake Distribution' in client #896
    • Completed the issue Adapt end to end tests to handle new types of data #899
    • Completed the issue Update client documentation #897
    • Completed the issue Update architecture documentations for new types of data #898
    • Completed the issue Refactoring client #960
  • Worked on the epic that prepares the Mithril infrastructure for mainnet #767:
    • Completed the issue Enhance terraform infrastructure #930
  • Completed the epic that implements the computation of the stake distribution for mainnet #880:
    • Completed the issue Check performance impact of new stake distribution command on the 'mainnet' #962
  • Worked on the epic Prepare Mithril Signer deployment model for SPO #862:
    • Worked on the issue Design recommended deployment model for SPOs on 'mainnet' and 'preview'/'preprod' #961
  • Worked on bugs and optimizations:
    • Completed the issue Aggregator does not always detect new immutable file #953
    • Completed the issue CI tests fail with Rust '1.70.0' #958
    • Worked on the issue End to end tests are flaky #954
    • Worked on the issue Certificate dates in metadata are not on the same timezone #946
    • Worked on the issue Refactor 'MithrilStakeDistribution' entity #967
    • Completed the issue Fix 'Mithril Client multi-platform test' with new client interface #956
    • Completed the issue Enhance 'ImmutableDigesterError::NotEnoughImmutable' error #969
    • Completed the issue Client 'snapshot download' command fails with option '--download-dir' #979

· 2 min read
Damian Nadales

High level summary

The Consensus team had a very productive meeting with IOG Researchers. We now seem to be in alignment in regards to a strong argument that the Byron and TPraos eras do not need to be checkpointed for an MVP. There is one remaining question (which applies also to the Praos era): how to assess the threat that short forks pose against historical windows that underperformed? We are currently collaborating on that. We also drafted an argument that the updated "Limit on Patience" timeout sufficiently bounds how long the adversary can inflate a victim's overall sync time.

On the UTxO-HD front, the prototype branch was rebased on top of the latest ouroboros-consensus main branch and integrated on top of cardano-node 8.1.1-pre. As a result, the mempool fairness fix that was released recently is now integrated into UTxO-HD. We managed to run a node again with UTxO-HD enabled. We also identified a race condition in the UTxO-HD prototype and fixed it. In addition, we started performing UTxO-HD ad-hoc benchmarks for cardano-node, which uncovered a performance regression on the Network component when using GHC-9.2/9.4. This is being addressed.

Regarding our support activities, we Released fs-sim-0.2.0.0 and are in the process of preparing the 8.2 release of cardano-node. We also identified and started fixing incorrectly-unevaluated thunks in preparation for enabling CI NoThunks tests.

· 2 min read
Michael Karg

High level summary

  • Benchmarking: We've continued release benchmarking and established a new baseline for 8.0.0.
  • New tracing: Our benchmarking profile for measuring new vs. legacy tracing performance has been refined.
  • Nomad backend: The healthcheck system for the the nomad cloud has been completed. We've performed the first full runs on the new backend.

Low level overview

Benchmarking

In our release benchmarking cycle, we established a new performance baseline for 8.0.0. Additionally, we've measured performance under various workloads for 8.1.1-pre; the results look promising and validate the optimization efforts done on several system components.

In the meantime, we've finalized a build plan with GHC9.2 that matches the current one with GHC8.10; a requirement for benchmarking as a large amount of differences in the dependency graph can confound the results for the application code proper.

Tracing

The legacy and the new tracing system differ fundamentally in design, implementation and handling. So for metrics to be meaningful in a comparison, benchmarking profiles have to be tuned such that not only log line frequency but frequency of specific trace messages are closely aligned. We've found that higher granularity in this regard was necessary, and done additional work on our dedicated profiles.

Additionally, we've had a first glance of what additional traces could be valuable in the context of benchmarking UTxO-HD.

Nomad backend

As the new backend's healthcheck system in its first iteration can now serve as a guardrail to ensure sanity of a full-length run, we've performed our first 52-node cluster runs on nomad cloud. We're currently smoothing the edges around cluster deployment, and analysing the metrics gathered from those runs.

This means the backend is entering validation phase, where we systematically compare all metrics taken from the new infrastructure to the existing ones, including determining reproducibility and variance.