Skip to main content

· One min read
Jordan Millar

Node-Api-Cli Update

2022-10-04 - 2022-10-18

Executive Summary

The majority of the team's time was spent between getting 1.34.4 ready, addressing various feature requests/issues/bugs that have arisen and refactoring components in the api and cli. The current refactoring is aimed at the long term goal of empowering users to be able to easily build applications similar to cardano-cli.

Completed

cardano-cli

cardano-api

cardano-node

In Progress

cardano-cli

cardano-api

cardano-node

· 3 min read
Jared Corduan

Ledger Update

We have continued focusing nearly entirely on addressing technical debt. A lot of design work has begun for the next ledger era, but we do not yet have anything concrete to share.

Technical debt issues completed

  • [issue-1676][pull-2992] We have finally removed the ledger dependency on the cardano-prelude package. It was barely used in the ledger repository, and it added a dependency that we did not want to maintain. It was a bit difficult to remove, and we had to coordinate removing it from cardano-base. A lot ended up going into pull-2992, due to the coordination effort, and we ended up updating Plutus as well. This means that we've now also made a lot of progress on the problematic cost model serialization issues described in issue-2902. In particular, after we resolve issue-3014, we will not have to wait an epoch before releasing a cost model for a new version of Plutus, as we had to do for the Vasil HF.
  • [issue-3046][pull-3055] We moved a module that is now only used in Byron to a Byron package.
  • [issue-3047][pull-3054] We improved the interface to the Value (multi-asset) type.
  • [pull-3044] We debugged and fixed a tricky compilation issue. Certain kinds of field updates were adding approximately 20 minutes to our compile time!
  • [issue-2932][pull-3036] As a part of our ongoing re-organization of the codebase, we have added a Cardano.Ledger.[Era].Core module to each ledger era that has a TxBody class. Most classes defined in the era should go in this new module. We also re-export the Cardno.Ledger.Core module and the previous Cardano.Ledger.[Era].Core modules from each era.

Technical debt in progress

  • [issue-3034][issue-3035][node-issue-4421] We are continuing to write benchmarks to understand exactly where all the time is being spent on executing the TICKF transition. The consolidation of the per-stake-credential stake distribution to the per-stake-pool distribution does seem to account for a large amount of time (near a second as written, which we have down to about half a second with some optimizations), but this does not account for everything. Applying the reward update may also be a big contributing factor.
  • [pull-3033][pull-3038][pull-3041] A separate team is working on upgrading all the cardano-node repositories to work with ghc 9.2.4. We have been helping out with this effort.
  • The nix scripts used to build our new formal ledger model do not work consistently for everyone, and we have been working on fixing these issues.
  • [issue-3014] We are still working on adding a versioning scheme to all of the ledger serializers.

· 3 min read
Marcin Szamotulski

Network Update

Ouroboros Network

Ouroboros Consensus

  • Recently we found out that the consensus does not log exceptions thrown during intiialisation. This was fixed in PR input-output-hk/ouroboros-network#4015 As part of this pull request we also changed that all exceptions rethrown by the connection handler thread are wrapped in ExceptionInHandler.

Some older items, which were not announced

  • We identified and fixed an issue related to socket activation (socket options where not set for sockets passed through socket activation). PR input-output-hk/cardano-node#3979 This fix will be released in the next cardano-node release.

Cardano Node

  • We extended the NixOs service module so that one can modify socketPath, runtimeDir, databasePath, traceSocketPathAccept, traceSocketPathConnect and stateDir options. PR input-output-hk/cardano-node#4196

IO-Sim

We resolved a number of issues before release of io-sim on hackage:

See PR #24.

We also improved experience for contributors of io-sim and typed-protocols by adding issue templates:

Typed Protocols

Input Endorsers Simulation

New features include:

  • Histograms of block arrival frequency, for both network (inbound) and CPU (block validation). This is interesting to check that we're not overloading the CPU block validation capacity, or network link capacity. Or alternatively to observe the behaviour in an overload situation if we set the block generation rate high enough.

  • Pie chart of utilisation of TCP links. This shows how small a fraction of links are being used at any one time, and shows that once the system "warms up" and is operating stably, most block delivery is ballistic.

  • Showing off the new screen layout combinators, that let us put multiple charts, titles etc on screen at once and scale them to whatever screen or video resolution we like without having to tweak numbers (this example is scaled to fit 1080HD video resolution).

· 4 min read
Damian Nadales
  • We proposed a fix for the performance degradation observed when running distributed multi-node benchmarks in the UTxO HD feature branch. While this fixed the problems observed when running local benchmarks, it broke the ThreadNet tests due to concurrency issues. Therefore, we think it is wise to start redesigning the UTxO HD mempool integration.
  • We did several rounds of code review on the alternative implementation of diff-sequences required by the UTxO HD feature based on the idea of anti-diffs. This alternative implementation is close to being merged, and the next step is to integrate this to the UTxO HD branch, so that we can run ad-hoc replaying and syncing from scratch benchmarks and compare these with the baseline. The micro-benchmarks we elaborated for the alternative implementation show speedups of up to 4x, so we are optimistic about the performance of replaying and syncing from scratch benchmarks, however it is important to notice that due to the nature of UTxO HD we will still be slower than the baseline.
  • The final draft of the Genesis implementation specification is ready for review.
  • We implemented a prototype for the happy path of Genesis' ChainSync Jumping (CSJ). The prototype is slower than the baseline, however it is not the latest version of the prototype and the jump interval is very small.
  • Work on integrating Conway has stopped since priorities have changed.
  • We started work on benchmarking epoch-boundaries and epoch overhead pr-4014. To this end, we made use of a modified version of our db-analyser tool. We ran the new benchmarking setup using the Cardano mainnet chain, and we can see that block tick and application take substantially longer at epoch boundaries, although there are a couple of slots during an epoch in which these computations take more than normal. We notified the ledger team about these findings. We will use this modified version of db-analyser to investigate the epoch overhead.

Workstreams

UTxO HD

  • Spent quite some time investigating the root cause of the degradation in performance observed in the benchmarks. We run the make forge-stress benchmarks locally in order to debug this behavior.

    • Transaction batching doesn't make a notable difference in the outcome (considering we are using the in-memory backend).

    • The mempool batching implementation required asynchronous transaction validation which is a violation of the LocalTxSubmission protocol contract and therefore if we continued on that route, the impact would have been quite big.

    • The STM logic we implemented by using a TMVar for the mempool internal state was buggy and under certain circumstances it seemed to lock. Reverting the mempool internal state to be stored in a TVar seems to solve this problem.

    • The results we get after this change look almost identical to the ones from the baseline.

  • The anti-diff prototype (PR #3997) has been reviewed and is close to being merged.

    • A follow-up issue (issue #4010) to integrate the anti-diff prototype in the various consensus packages was created. A first version of the integration exists, and all tests pass. A next step is to get some indication of the "real" performance gain by profiling db-analyser (or cardano-node).

Genesis

  • Final draft of the Genesis implementation specification, now up for review.

  • Local benchmark setup for parameter tuning via the happy path ChainSync Jumping (CSJ) prototype (Issue 3987).

    • Context: Our Genesis design requires us to check in with a large (~20) number of servers periodically while syncing. These servers are offered jump requests via the ChainSync protocol (hence the name), which they can accept or decline. If a peer declines, the Genesis rule allows us to determine whether a node actually has a better chain.

    • The "happy path" is when no peer declines a jump. We want this to have close to no overhead compared to status quo, i.e. syncing without Genesis.

    • We implemented a prototype for this happy path, and are now starting to test in various configurations (number of peers, latency, bandwidth) to tune the performance of ChainSync jumping, i.e. how complicated our logic of choosing when to jump needs to be.

      Example:

    • Simulated connection: 50 MBit/s, 50ms latency

    • Jump interval: 3000 slots (on the low end, could be increased to up to 3k/f)

    • Red: baseline (1.35.3), one peer in topology file

    • Blue: Preliminary version of our prototype, with 10 peers.

      It is slower by about ~30%, but it is not the latest version of the prototype, and the jump interval is very small, making CSJ more of a bottleneck.

Technical debt

  • Fix flakiness in ChainDB QSM tests (PR 3990).

· 2 min read
Kostas Dermentzis

DBSync Update

New Tag

We created a new db-sync tag 13.0.5 which addresses shortcomings of the last release 13.0.4. It is currently under testing. The Changelog is here and in more details:

  • We fixed fees for tx with phase 2 failure that didn't include a total collateral field. 1248

  • We fixed an issue that could cause db-sync to crash if a specific rollback occured. 1247

  • DBSync will now avoid reserialising data, especially Datums, which not only slows down db-sync but could result in the wrong CBOR encoding being inserted. 1217

  • All the fixes above come with unit tests which validates the fix.

  • Added support for preprod and preview from docker. DBSync no longer needs to include the configs for different networks, these are directly fetched from the cardano world. 1254

  • We added better support from docker for the new disable options and the overall documentation. 1260

All the above were also backported to the master branch

Open source

We made the db-sync board public, so everyone can have access on the issues we prioritise. We also added new tasks to the board, some of them could be approachable to newcomers or people who want to contribute.

Progress on tech debt and new features

  • 1223 was merged, which removes the foreign keys from the db schema. This opens the road to a number of optimizations.

  • An additional fix on top of the previous work was added 1250

  • An initial version where DBSync does not rollback on restart is done here 1266. This allows db-sync to restart much faster, without the need to delete data and reinsert them. In the future it can also facilitate migrations in cases where the ledger snapshots have a breaking change, without the need to resync everything from genesis.