Skip to main content

· 2 min read
Marcin Szamotulski

High-level overview of sprint 52

Happy New Year!

In this short sprint we analysed a failure which happened on a new large cluster that's run by IOG. The process exhausted all file handles and was left without any functional connections. The issues apparently is rare, and thus doesn't impose a high risk.

We also continued working on tx-submission: ouroboros-network-3311.

Detailed description

It turned out that the process exhausted the number of file handles leaking multiple /proc/{PID}/stat files open. We suspect that the bug is caused by

  • using lazy IO in iohk-monitoring-framework, and
  • using a recent kernel version

With lazy IO file handles are read as long as the data is required and they are closed only when EOF is reached. We currently suspect that a new linux kernel added something at the end of the /proc/{PID}/stat which is not parsed by iohk-monitoring-framework, so whenever the file is read we leak it (it's never closed) and eventually, there are no file handles to be used by the network layer: the accept loop doesn't return any inbound connection, neither an outbound connection can be created. This issue will be addressed by the profiling team (which owns the logging subsystem).

The fix will be proposed in the future release, in the meantime we suggest to keep observing file handles used by the node.

I would like to thank John Lotoski (IOG), Karl Knutsson (CF), Neil Davies (PNSol) and Michael Karg (IOG) who all contributed to this analysis.

While analysing the log we also found a few smaller issues in the outbound governor which were fixed in [ouroboros-network-#4764].

The IO error indicating exhausting file handles is not currently visible. It is not re-thrown nor logged. This needs to be fixed in a future version. See ouroboros-network-4769.

· One min read
Sebastian Nagel

High-level summary

In between the end-of-year holidays, the Hydra team completed the implementation of the Conway support feature, fixed a minor bug that was hindering tests from running on MacOS, and addressed a regression in the protocol-parameter formats used by the hydra-node. They also worked on off-chain code for incremental decommits, specifically focusing on transaction creation. Furthermore, they conducted a spike on implementing a Chess game using Hydra, with an experience report provided.

What did the team achieve this week

  • Fixed a regression on protocol-parameter formats used by the hydra-node #1226
  • Fixed a minor bug prohibiting tests running on MacOS #1218
  • Complete conway support feature #1227
  • Transaction creation off-chain code for incremental decommits #1218
  • First spike on implementing a Chess game on Hydra report, related to: #1098

What are the goals of next week

  • Fully resolve protocol parameter misalignment #1234
  • Cut a release 0.15.0 to ship offline-mode and conway support
  • Prepare demo for conway support
  • Complete transaction creation and observation for incremental decommits
  • Backend for a hydra-explorer that can track all heads on-chain

· One min read
John Lotoski

High level summary

During the lightly staffed holiday period for node SRE, the emphasis was on maintaining environment stability, tuning and resolving any noisey alerts.

Investigation into and testing around the following two topics also started during this period:

  • Ledger snapshots causing a small number of missed slots for forgers on mainnet: ouroboros-consensus-issue-868

  • A cardano-node rare file descriptor leak, with a more detailed description here

· One min read
Carlos LopezDeLara

2023-12-09 - 2023-12-30

High level summary

  • Migrated reposotiries to IntersectMBO.
  • Improved era handling on cardano-api. Instead of enumerating every possible era, we use two constructors: 'CurrentEra' and 'UpcomingEra'. This design simplifies the handling of eras, especially for cardano-api consumers who are primarily concerned with the current mainnet era and the next era for an upcoming hardfork.
  • Cleaning-up the cardano-cli, in particular to the babbage era commands where some Conway options had spilled.

cardano-cli

cardano-api

cardano-node

cardano-testnet

docs

CI & project maintenance

· One min read
Sebastian Nagel

High-level summary

This week, the Hydra team made significant progress, implementing an offline mode with associated refactoring. They enhanced user experience by detecting incompatible eras in hydra-node. Protocol changes were implemented for incremental decommits, addressing off-chain logic. Additionally, the team contributed fixes to cardano-ledger and coordinated with the Eternl team on enabling committing into a head from their wallet.

What did the team achieve this week

  • Offline mode implementation #1118 and refactoring #1222
  • Detect incompatible era in hydra-node and provide better UX #1216
  • Implemented protocol changes for incremental decommits (off-chain logic) #1057
  • Contributed fixes to cardano-ledger#3949 and #3953
  • Synced up with the Eternl team on enabling committing into a head from their wallet

What are the goals of next week

  • Maybe cut a release 0.15.0 to ship offline-mode and unsupported era UX
  • Full conway support in hydra-node
  • Transaction creation and observation for incremental decommits