Skip to main content

· 5 min read
Michael Karg

High level summary

  • Benchmarking: We've performed and analysed benchmarks in the Conway era, with DReps injected.
  • Development: Tracing DRep data has been implemented; improved error reporting in tx-generator and analysis quick queries are ongoing work.
  • Workbench: We now fully supports the new CLI create-testnet-data command and DRep injection into Conway genesis. Haskell profile definition work is ongoing.
  • Tracing: Various additions to Node metrics are being worked on, such as build info and block producer role. Metrics naming will be further harmonized.
  • UTxO Growth: We've finalized analysis and reports of all benchmarks targeting UTxO scaling scenarios.
  • UTxO-HD / LMDB: We've performed multiple runs benchmarking the LMDB (on-disk) backend of UTxO-HD.

Low level overview

Benchmarking

We've run and analyzed a full set of benchmarks comparing the Conway ledger against the Babbage one, on Node 8.10.1-pre. For Conway, our additional goal was to measure a vanilla ledger state against one with a large amount of DReps - and delegations to those DReps - present. The benchmarks used our existing value and Plutus workloads to remain comparable to each other.

Development

Additional ledger queries for the tracing system have been implemented and merged to master. Those capture the amount of, and the number of existing delegations to, DReps as trace output - and thus enable creating a metric on top of it, which can then be monitored.

The (in our case) non-deterministic nature of shutting down different cluster setups - both local and cloud-based - carries the possibility that our transaction generation service occasionally misclassifies a regular shutdown as an error. Furthermore, in the case of network malfunctions, the service's errors are too unspecific. By implementing thread labels for submission threads, corresponding to each submission target, and by adding custom smart signal handlers, we'll improve the generator's error reporting significantly.

The initial tests for quick queries are being developed further. We're moving towards a principled, and generalized, syntax that supports both prepared, parametrizable queries from the application code, as well as ad-hoc queries stated e.g. on the command line.

Workbench

The performance workbench now fully supports the new cardano-cli command create-test-data. We use it to inject both stake delegated to stake pools into genesis, and - recently added - stake delegated to DReps as well. It has been proven very useful and versatile so far, and will eventually replace the current create-staked command.

Work on porting our performance workbench's profile definitions to Haskell, and providing them with an appropriate test suite, is still ongoing; currently, we're integrating all new profile families that came out of the UTxO growth scenarios.

Tracing

New metrics are being implemented for the tracing system. They will also be part of Prometheus output and as such accessible to monitoring services. There'll be cardano-node's detailed build info, as well as a node's block producer status, meaning the presence of forger credentials. Those new metrics are being backported to the legacy tracing system, too.

Furthermore, we've determined the need to revisit metrics naming. There's still a divergence between naming in the legacy and the new system. While this could be mitigated by passing in extra config options, we think that a transition to the new system should not impose any unnecessary effort for node operators. A design to fully harmonize the existing naming schemata is currently being set up.

UTxO Growth

The UTxO Growth benchmarking series has been finalized. We've finished analyses and reports for all scenarios that were tested and explored.

The overarching questions were, given a network of 32GB host systems, how large can the UTxO set grow in general, how large can it grow before the nodes have to operate close to the RAM limit over extended periods of time, and how does scaling the UTxO set size affect network metrics, such as block diffusion.

A dedicated "UTxO Scaling Squad" was set up, who was driving the entire process, and we enjoyed a very focused and productive collaboration with them.

UTxO-HD / LMDB

Last not least, we were able to benchmark UTxO-HD's on-disk backend on a network of block producing nodes, on a recent 8.9.1 version of cardano-node. The setup allowed of using a direct access SSD device for performance critical disk I/O, whereas the bulk of ChainDB and ledger snapshots remained on a standard AWS EBS volume.

The benchmarks comprised both optimistic and pessimistic RAM assumptions for the host OS to further optimize I/O via page cache, as well as medium and large UTxO set sizes - the latter almost tripling current mainnet's size. The results were promising; the LMDB backend has proven to be able to accomodate large UTxO sets using significantly less RAM than the default all-in-memory node - and with a more than reasonable trade-off performance-wise. Furthermore, running with pessimistic assumptions, the performance impact on LMDB was very moderate only.

· 3 min read
Marcin Szamotulski

High-level overview of sprint 60

Edited on 8th of May: new EGK counters will be included in `cardano-node-8.9.3`, added links to `cardano-node-8.9.3` PR and `ouroboros-network-0.15` release.

Peer-Sharing Improvements

We continued working on improving peer sharing. As part of this work light peer sharing (e.g. including inbound peers to the known set of outbound governor), was restructured. Now, sending more peers than what was requested by the peer-sharing client is a protocol error, and the connection will be terminated; This hasn't been a resource attack vector since we always limited the number of peers taken by the outbound-governor and the number of peers has always been limited by the size of the mux ingress queue reserved for peer-sharing mini-protocol. These changes will be released in cardano-node-8.9.3. See ouroboros-network#4868

We also merged the work on outbound governor counters, which initially started as just an extension for peer-sharing counters but turned into a larger refactorisation. We announced it in the previous report. These changes will be included in 8.9.3. See ouroboros-network#4845, ouroboros-network#4861.

Light peer sharing (inbound peers) refactorisation allowed us to refactor the inbound governor loop: we restructured it so that the internal state is kept pure (and thus not shared with other threads), while the public part is computed incrementally (with good amortised costs and thus leading to good performance) and exposed to other components (e.g. the outbound-governor), see ouroboros-network#4871 (which is built on top of ouroboros-network#4868).

The PR [cardano-nod#5831] integrates ouroboros-network-0.15 with cardano-node-8.9.x branch. All included PRs / issues in ouroboros-network-0.15 are listed here.

Genesis

We implemented the API needed by the consensus layer for Genesis; see ouroboros-network#4815, ouroboros-network#4846.

We continued working on outbound governor changes to support Genesis:

Bootstrap Peers

Karl Knutsson ([CF]) found and fixed some problems related to big-ledger and public root peers. Here's an excerpt from the changelog file:

  • updated the big-ledger retry state in case of an exception;
  • reset public root retry state when transitioning between LedgerStateJudgements;
  • reduced public root retry timer;
  • don't classify a config file with public-root/bootstrap-peers IP addresses only as a DNS error. See ouroboros-network#4867.

Churn

We merged a refactorisation which synchronises churn with the outbound governor, see ouroboros-network#4617.

Minor Improvements

A few other minor improvements were merged:

Testing

We added quickcheck-monoids package and also submitted an upstream patch to QuickCheck to include a version of the standard All / Any monoids, which are helpful when writing more complex properties. We will use quickcheck-monoids until the upstream PR will be released. It will be available from CHaP. See quickcheck#397.

· One min read
Sebastian Nagel

High-level summary

This week, the Hydra team has been working on refactoring and detecting network protocol version mismatches. They have also merged the /commit endpoint changes including a follow-up fix about fee calculation. Besides this, they applied minor workflow fixes by adding docker images to nix checks and disabling mithril integration testing on preview (until mithril 2418 is released).

What did the team achieve this week

  • Refactor connectivity and detect network protocol version mismatches #1381
  • Merged and completed #1350, including a follow-up fix about fee calculation
  • Add docker images to nix checks
  • Disable mithril-client testing on Preview

What are the goals of next week

  • Restructure documentation including a how to about streaming plugins #1325
  • Add arm64 docker images as requested in #1404
  • Release 0.17.0

· One min read
Damian Nadales

High level summary

  • Reworked the argument for the different databases used in Consensus, in preparation for UTxO-HD (#1059).
  • Helped review the first Peras Innovation draft report.
  • Continued working on VRF restriction based on slot distance. The corresponding PR (#1047) went through its first round of reviews.
  • Provided support to the Networking team to review their work on querying big ledger peers (#1067).
  • Continued working on open-sourcing fs-api and fs-sim.
  • Performed other minor refactorings in the codebase (#1073 and #1070).

· 2 min read
Jean-Philippe Raynaud

High level overview

This week, the Mithril team prepared a new pre-release distribution 2418.1-pre, which includes broader CPU support for pre-built binaries and a new memory allocator for the signer and aggregator nodes to prevent memory fragmentation. They also continued implementing the certification of Cardano transactions in Mithril networks and worked on scaling the signature and proof generation for mainnet by leveraging the compression of the transaction Merkle tree using sub-Merkle trees based on transaction block ranges during signature and proving. Additionally, they implemented a stream mechanism for importing transactions into the signer and aggregator stores.

Finally, the team started implementing a global Mithril networks configuration file and continued investigating some unexpected error logs occurring on the Cardano node when the signer and aggregator connect to the mini-protocols.

Low level overview

  • Created a pre-release for the new distribution 2418.1-pre
  • Completed the issue Store Block Range Merkle roots in signer and aggregator databases #1633
  • Completed the issue Stream import of Cardano transactions #1646
  • Completed the issue Memory leak in Cardano transactions signature/proof #1629
  • Completed the issue Handle unparsed blocks in Cardano transactions parser #1567
  • Worked on the issue Use Block Range Merkle roots to sign Cardano transactions #1634
  • Worked on the issue Use Block Range Merkle roots to prove Cardano transactions #1635
  • Worked on the issue Use SQLite transactions when inserting Cardano Transactions and Block Range Roots #1656
  • Worked on the issue Add Mithril networks configurations in networks.json #1638
  • Worked on the issue ChainObserver supports retrieving the Chain Point of the tip of the chain #1589
  • Worked on the issue Add section for manual setup of squid in SPO guide #1610
  • Worked on the issue Mithril Signer Local Error Policy : Error 182 - MuxError #1632