Skip to main content

· One min read
John Lotoski

High level summary

During the lightly staffed holiday period for node SRE, the emphasis was on maintaining environment stability, tuning and resolving any noisey alerts.

Investigation into and testing around the following two topics also started during this period:

  • Ledger snapshots causing a small number of missed slots for forgers on mainnet: ouroboros-consensus-issue-868

  • A cardano-node rare file descriptor leak, with a more detailed description here

· One min read
Carlos LopezDeLara

2023-12-09 - 2023-12-30

High level summary

  • Migrated reposotiries to IntersectMBO.
  • Improved era handling on cardano-api. Instead of enumerating every possible era, we use two constructors: 'CurrentEra' and 'UpcomingEra'. This design simplifies the handling of eras, especially for cardano-api consumers who are primarily concerned with the current mainnet era and the next era for an upcoming hardfork.
  • Cleaning-up the cardano-cli, in particular to the babbage era commands where some Conway options had spilled.

cardano-cli

cardano-api

cardano-node

cardano-testnet

docs

CI & project maintenance

· One min read
Sebastian Nagel

High-level summary

This week, the Hydra team made significant progress, implementing an offline mode with associated refactoring. They enhanced user experience by detecting incompatible eras in hydra-node. Protocol changes were implemented for incremental decommits, addressing off-chain logic. Additionally, the team contributed fixes to cardano-ledger and coordinated with the Eternl team on enabling committing into a head from their wallet.

What did the team achieve this week

  • Offline mode implementation #1118 and refactoring #1222
  • Detect incompatible era in hydra-node and provide better UX #1216
  • Implemented protocol changes for incremental decommits (off-chain logic) #1057
  • Contributed fixes to cardano-ledger#3949 and #3953
  • Synced up with the Eternl team on enabling committing into a head from their wallet

What are the goals of next week

  • Maybe cut a release 0.15.0 to ship offline-mode and unsupported era UX
  • Full conway support in hydra-node
  • Transaction creation and observation for incremental decommits

· 2 min read
Marcin Szamotulski

High-level overview of sprint 51

Outbound Governor Bug in cardano-node-8.7.2

In the current sprint, we received a bunch of reports from SPOs about nodes not maintaining some connection when using cardano-node-8.7.2 (running in P2P mode). Such regressions are very important to us since they can lead to lost blocks. We were able to reproduce this issue. Every time there's a longer pause of block production (due to the statistical nature of Ouroboros), there is a chance that the bug will be armed. For this reason cardano-node-8.7.2 needs to be closely monitored.

We found the bug and developed a fix, ref. Karl Kntusson (CF) wasn't able to reproduce the bug with the patched version of the node for long enough (almost two weeks now) for us to belive that the fix is correct.

Advise for SPOs

We created a release branch for 8.7.3. The advice from the network team is to either downgrade to the previous release, e.g. 8.1.2 or use the above release branch (note that there were no benchmarks made or Q&A tests yet).

Testing plans

We were also able to reproduce the bug using IOSim, ouroboros-network#4757. However, the bug relies on a particular schedule of two threads which are involved and we needed to artificailly modify IOSim schedule in production code - something that we don't want to commit to the master branch. We also experimented with a randomised scheduler for IOSim, but that did not lead to finding the schedule which arms the bug: the search space grows exponentially with the number of steps in the threads, partial order reduction techniques implemented in IOSimPOR are more appropriate - unfortunatelly the simulation test is too large to be executed in IOSimPOR even with large amounts of RAM. To use IOSimPOR we need to implement a test which includes the two interacting components:

  • connection-manager
  • outbound-governor (where the bug was located)

which communicate through PeerStateActions, without including all the diffusion components as we do in our sim-net tests. More in style of outbound-governor tests where there is just a single outbound-governor, unlike in the sim-net which runs multiple communicating diffusions.

Bootstrap peers

We continued working on bootstrap peers, ouroboros-network#4555

TxSubmission Decision Logic

We continued working on tx-submission decision logic, ouroboros-network#3311

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • A new repository was created which enables agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers: cardano-monitoring
  • The govtool backend swagger interface was nix flake packaged and deployed for Voltaire private chain testing usage
  • Grafana cloud monitoring stacks were migrated to new EC2 cardano-monitoring servers
  • Cardano-db-sync state snapshots now support client range requests, details here
  • In addition to monitoring server centralized grafana metrics, sysstat collected system metrics are now available locally on all cluster machines at high time resolution
  • Code changes required due to repository migrations to IntersectMBO have largely been completed

Lower level summary

Auth-keys-hub

Cardano-monitoring

  • A new repository enabling agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers: cardano-monitoring

Cardano-parts

  • Migrate from grafana cloud monitoring to ec2 monitoring, add resource tagging support, declarative route53 CNAME list, and additional improvements and fixes: cardano-parts-pull-25
  • Improve ssh key handling and edge cases, resolve misc issues, add IOPS and throughput gp3 openTofu support: cardano-parts-pull-26

Cardano-playground