Skip to main content

· 3 min read
Marcin Szamotulski

Network Update

Ouroboros Network

Ouroboros Consensus

  • Recently we found out that the consensus does not log exceptions thrown during intiialisation. This was fixed in PR input-output-hk/ouroboros-network#4015 As part of this pull request we also changed that all exceptions rethrown by the connection handler thread are wrapped in ExceptionInHandler.

Some older items, which were not announced

  • We identified and fixed an issue related to socket activation (socket options where not set for sockets passed through socket activation). PR input-output-hk/cardano-node#3979 This fix will be released in the next cardano-node release.

Cardano Node

  • We extended the NixOs service module so that one can modify socketPath, runtimeDir, databasePath, traceSocketPathAccept, traceSocketPathConnect and stateDir options. PR input-output-hk/cardano-node#4196

IO-Sim

We resolved a number of issues before release of io-sim on hackage:

See PR #24.

We also improved experience for contributors of io-sim and typed-protocols by adding issue templates:

Typed Protocols

Input Endorsers Simulation

New features include:

  • Histograms of block arrival frequency, for both network (inbound) and CPU (block validation). This is interesting to check that we're not overloading the CPU block validation capacity, or network link capacity. Or alternatively to observe the behaviour in an overload situation if we set the block generation rate high enough.

  • Pie chart of utilisation of TCP links. This shows how small a fraction of links are being used at any one time, and shows that once the system "warms up" and is operating stably, most block delivery is ballistic.

  • Showing off the new screen layout combinators, that let us put multiple charts, titles etc on screen at once and scale them to whatever screen or video resolution we like without having to tweak numbers (this example is scaled to fit 1080HD video resolution).

· 4 min read
Damian Nadales
  • We proposed a fix for the performance degradation observed when running distributed multi-node benchmarks in the UTxO HD feature branch. While this fixed the problems observed when running local benchmarks, it broke the ThreadNet tests due to concurrency issues. Therefore, we think it is wise to start redesigning the UTxO HD mempool integration.
  • We did several rounds of code review on the alternative implementation of diff-sequences required by the UTxO HD feature based on the idea of anti-diffs. This alternative implementation is close to being merged, and the next step is to integrate this to the UTxO HD branch, so that we can run ad-hoc replaying and syncing from scratch benchmarks and compare these with the baseline. The micro-benchmarks we elaborated for the alternative implementation show speedups of up to 4x, so we are optimistic about the performance of replaying and syncing from scratch benchmarks, however it is important to notice that due to the nature of UTxO HD we will still be slower than the baseline.
  • The final draft of the Genesis implementation specification is ready for review.
  • We implemented a prototype for the happy path of Genesis' ChainSync Jumping (CSJ). The prototype is slower than the baseline, however it is not the latest version of the prototype and the jump interval is very small.
  • Work on integrating Conway has stopped since priorities have changed.
  • We started work on benchmarking epoch-boundaries and epoch overhead pr-4014. To this end, we made use of a modified version of our db-analyser tool. We ran the new benchmarking setup using the Cardano mainnet chain, and we can see that block tick and application take substantially longer at epoch boundaries, although there are a couple of slots during an epoch in which these computations take more than normal. We notified the ledger team about these findings. We will use this modified version of db-analyser to investigate the epoch overhead.

Workstreams

UTxO HD

  • Spent quite some time investigating the root cause of the degradation in performance observed in the benchmarks. We run the make forge-stress benchmarks locally in order to debug this behavior.

    • Transaction batching doesn't make a notable difference in the outcome (considering we are using the in-memory backend).

    • The mempool batching implementation required asynchronous transaction validation which is a violation of the LocalTxSubmission protocol contract and therefore if we continued on that route, the impact would have been quite big.

    • The STM logic we implemented by using a TMVar for the mempool internal state was buggy and under certain circumstances it seemed to lock. Reverting the mempool internal state to be stored in a TVar seems to solve this problem.

    • The results we get after this change look almost identical to the ones from the baseline.

  • The anti-diff prototype (PR #3997) has been reviewed and is close to being merged.

    • A follow-up issue (issue #4010) to integrate the anti-diff prototype in the various consensus packages was created. A first version of the integration exists, and all tests pass. A next step is to get some indication of the "real" performance gain by profiling db-analyser (or cardano-node).

Genesis

  • Final draft of the Genesis implementation specification, now up for review.

  • Local benchmark setup for parameter tuning via the happy path ChainSync Jumping (CSJ) prototype (Issue 3987).

    • Context: Our Genesis design requires us to check in with a large (~20) number of servers periodically while syncing. These servers are offered jump requests via the ChainSync protocol (hence the name), which they can accept or decline. If a peer declines, the Genesis rule allows us to determine whether a node actually has a better chain.

    • The "happy path" is when no peer declines a jump. We want this to have close to no overhead compared to status quo, i.e. syncing without Genesis.

    • We implemented a prototype for this happy path, and are now starting to test in various configurations (number of peers, latency, bandwidth) to tune the performance of ChainSync jumping, i.e. how complicated our logic of choosing when to jump needs to be.

      Example:

    • Simulated connection: 50 MBit/s, 50ms latency

    • Jump interval: 3000 slots (on the low end, could be increased to up to 3k/f)

    • Red: baseline (1.35.3), one peer in topology file

    • Blue: Preliminary version of our prototype, with 10 peers.

      It is slower by about ~30%, but it is not the latest version of the prototype, and the jump interval is very small, making CSJ more of a bottleneck.

Technical debt

  • Fix flakiness in ChainDB QSM tests (PR 3990).

· 2 min read
Kostas Dermentzis

DBSync Update

New Tag

We created a new db-sync tag 13.0.5 which addresses shortcomings of the last release 13.0.4. It is currently under testing. The Changelog is here and in more details:

  • We fixed fees for tx with phase 2 failure that didn't include a total collateral field. 1248

  • We fixed an issue that could cause db-sync to crash if a specific rollback occured. 1247

  • DBSync will now avoid reserialising data, especially Datums, which not only slows down db-sync but could result in the wrong CBOR encoding being inserted. 1217

  • All the fixes above come with unit tests which validates the fix.

  • Added support for preprod and preview from docker. DBSync no longer needs to include the configs for different networks, these are directly fetched from the cardano world. 1254

  • We added better support from docker for the new disable options and the overall documentation. 1260

All the above were also backported to the master branch

Open source

We made the db-sync board public, so everyone can have access on the issues we prioritise. We also added new tasks to the board, some of them could be approachable to newcomers or people who want to contribute.

Progress on tech debt and new features

  • 1223 was merged, which removes the foreign keys from the db schema. This opens the road to a number of optimizations.

  • An additional fix on top of the previous work was added 1250

  • An initial version where DBSync does not rollback on restart is done here 1266. This allows db-sync to restart much faster, without the need to delete data and reinsert them. In the future it can also facilitate migrations in cases where the ledger snapshots have a breaking change, without the need to resync everything from genesis.

· 2 min read
Jared Corduan

Ledger Update

We have been focused nearly entirely on addressing technical debt.

  • We introduced more consistent naming across eras, this time for the auxiliary data. See 3032.
  • We made clear how the consumed functions differs between eras (which was a previous source of confusion), and added some related support to the fledgling ledger API. See 3016.
  • We added clarity and organizational consistency to the main ledger era type synonyms. See 3017.
  • We removed code duplication related to the input data hashes. See 3018.
  • We split up a large module into smaller components. The large module was actually causing our CI to time out. See 3020.
  • We cleaned up stale information in our cabal files, and upgraded cabal 3.8. See 3023, 3031, and 3028.
  • We made consistent, standalone TxOut (transaction output) modules for every era. See 3024.
  • We brought consistency to a maddening inconsistent use of type variables indicating the specific choice of cryptographic primitives. In particular, all uses of crypto have been renamed to c. See 3027.
  • We did a clean up of the types in the Alonzo era. In particular, we switched to more parametric types that will compose better in the future and which simplifies the constraints. See 3029.
  • We consolidated some existing fragmented logic regarding how we gather the scripts needed for a given transaction. This is a much needed cleanup to prevent future mistakes. See 3019.
  • We fixed a problem with our generators that was causing a fair number of our property tests to fail in CI. See 3039.
  • We have started the work to update Plutus. This will bring support for SECP in the next major protocol version, and also address a problem that we current have evolving the cost models. See 3030.
  • We addressed a small issue that came up when integrating the conway era downstream, namely the lack of some serialization instances. See 3022.

· 2 min read
Jared Corduan

Ledger Update

Since finishing up support for the Vasil Hardfork, the ledger team has been focused on two main things: a new ledger era and technical debt.

New minimal ledger era

We have implemented a new ledger era named conway which is nearly identical to the babbage era. This has been the first time that we have been able to see what a minimal ledger era looks like. We have finished this task, modulo any integration issues that might come up. The only thing that the conway era does differently from the babbage era is provide support for rotating the master keys using the hardfork combinator's state translation. We may end up adding features to the conway era, but it is a nice exercise seeing what it looks like to get a minimal ledger era supported in all the downstream components.

Addressing technical debt

We have been addressing technical debt, mostly in an effort to make the repository a more friendly code base to work in.

  • We have begun work on a ledger API, called cardano-ledger-api.
  • We have done a big re-design of the major type classes used in the ledger. With hindsight on our side, we now have something much more organized and easier to use.
  • We have done a lot of re-naming. The names across eras are now much more uniform, avoid certain confusions that plagued us, and are clearer in where they are from.
  • We have reduced a lot of code duplication that could lead to bugs if you do not have the whole code base in your head.
  • We have added a handful of performance improvements.
  • We added type safety in a number of locations. In particular, the type of values that can be minted in a transaction no longer allow for Lovelace in the type, and some functions which used to handle both timelock scripts and plutus script now correctly enoforce at the type level that only one of them can be used.
  • We made our generators so that they now produce a much richer set of valid serializations. There is room within CBOR to serialize the same data structure in multiple ways, and it is helpful to have the generators use a wide variety.
  • We have begun re-organizing our test suites.