Skip to main content

· 2 min read
Jordan Millar

· 2 min read
Serge Kosyrev

High level summary

  1. Benchmarks for the 1.36 first pre-release bump of the internal components have been delivered, and data shows the component bump is clear for release.
  2. SECP benchmarking enablement is underway: the necessary generator features have been implemented, and are now being integrated into the workbench.
  3. The new tracing system: in response to the performance regression we previously discovered we are working on pre-planned implementation improvements, and doing more benchmarks.
  4. Infrastructure: the Nomad-based workbench backend has been made closer to a cloud deployment scenario. Cleanup in preparation for Cicero CI/CD integration started.
  5. Open sourcing: ongoing SRE collaboration on production deployment of performance data publishing.

Performance

We have ran benchmarks for the first component bump of the upcoming 1.36 release, and we don't see any significant performance changes. The component bumps are therefore clear for release.

Tracing

For the tracing system regression that we spotted -- even before, we already had plans for further efficiency improvement, and now we are actively pursuing them. The idea is to collect more statically-available information to enable shifting of more tracing decisions from message delivery time to configuration time.

To support this effort, we also started running more benchmarks and enhanced data analysis with relevant metrics.

Infrastructure

Generation support for Plutus V2 has been implemented and so, with the help of the previously made looped signature-verifying script, the generator is now capable of producing two SECP workloads: verifying either ECDSA or Schnorr signatures. This is now being integrated into the infrastructure -- the generator parametrisation API is being enhanced and the workbench is being extended to handle the new parametrisation.

In addition the workbench is now being enhanced to handle protocol-version-based choices for the Plutus cost model.

The intermediate cloud compatibility iteration of the workbench cloud enablement effort was merged. We are now doing some cleanup work in preparation for starting the Cicero backend, which will bring us nearly completely to the CI/CD integration.

We continue collaboration with SRE on production deployment of data publishing. We now have a gradual rollout plan, which respects the plans for SRE infrastructure feature availability.

We are working on recovering the software dependency manifest feature that was lost with the organisation-wide transition to CHaP.

As usual, a number of smaller workbench, data analysis & reporting improvements have been made.

· 4 min read
Marcin Szamotulski

Stake-Driven Data Diffusion Release for Relays

IOG networking team decided to release the Stake-Driven Data Diffusion with Robust Optimised Peer Selection also more commonly known as P2P. In the last update, we informed about a performance regression, but it turns out it only affects block producers, and thus we highly advise against running it on such nodes. Further investigation is required to find the cause of it.

On IOG's benchmarking cluster we have seen quite a good performance improvement on block propagation itself. The cluster is running a static topology with valency 6 (each node is connected to 6 other nodes). In which every of the 50 nodes are block producers. The setup of this network is the same as mainnet. We've seen 40-50% performance improvement on block propagation comparing to the same cluster deployed with the same topology but using non-P2P nodes. We think this performance improvement is caused by using full duplex connections. Quite likely the transaction traffic floating in both directions on the same TCP connection helps to keep the TCP window open. Note that in a cluster of 50 nodes with valency 6 the probability of having at least one duplex connection is more than 50%. We don't expect the same improvement on mainnet because the network is much wider and the transaction traffic is not as large.

Just before the release we squashed two small bugs:

  • issue #4163 - top level integration bug in keep-alive;
  • issue #4177 - a bug in outbound-governor;
  • PR #4165 - a fix cardano-ping support of NodeToNodeV_10.

Peer Sharing

We were carrying a review of peer sharing PR.

DeltaQ

Neil Davies was invited to give a guest lecture entitled Avoiding System Catastrophes at UCLouvain.

What have we achieve last sprint

  • issue #4163: we found out that a control message is not passed to the keep-alive mini-protocol, this results in every demotion executing demotion timeout rather than a graceful termination. With the fix the node will no longer log:

    { "kind": "PeerStatusChangeFailure"
    , "peerStatusChangeType": "WarmToCold (ConnectionId {localAddress = 192.168.0.10:7000, remoteAddress = 3.129.186.40:3000})"
    , "reason": "TimeoutError"
    }
  • issue #4177: we fixed an assertion failure in the outbound-governor; now we don't try demoted peers which are being demoted already.

  • PR #4155: we refactored ouroboros-network packages. There's a top level ouroboros-consensus-diffusion package which integrates network & consensus code. We also introduced:

    • ouroboros-network-api package which contains the API shared between network & conensus;
    • ouroboros-network-mock package which contains mock API used for testing (e.g. a mock chain & chain producer, etc.)
    • ouroboros-network-protocols package which contains implementation of all (but handshake) mini-protocols, exposes a testlib and contains test and cddl components.

    This made the dependency tree of network & consensus packages much cleaner.

  • PR #4169: we described the usage of release branches in CONTRIBUTING.md doc.

  • PR #4165: we fixed cardano-ping support of NodeToNodeV_10 protocol.

DeltaQ

The abstract of the talk:

An essential step to ensuring that distributed systems are fit for purpose.

Distributed systems have become an integral part of our society and daily lives. We are, both implicitly and explicitly, individually as well as collectively, placing ever more trust in them.

Are they worthy of this trust? Our need for them to be ‘fit-for-purpose’ goes well beyond notions of functional correctness (i.e. never getting the wrong answer). We need them to deliver the desired outcomes in a timely, robust, reliable, resilient fashion, at scale and in a sustainable way (both economically and environmentally).

This all sounds like a worthy aspiration, but what would be a practical approach to capturing and reasoning about these issues? How can we ensure that systems can meet their fit-for-purpose objectives, not just in their design but as they are deployed, encounter the imperfect world, are scaled to become economic, and proceed into ongoing maintenance?

This talk will illustrate how the notions of Outcomes and Quality Attenuation (as captured by ‘∆Q’) are being used to both frame the necessary notions and provide a basis for assuring the refinement and reification of such systems, from initial concept to operational infrastructure.

You can download the slides from here.

· 2 min read
Iñigo Querejeta Azurmendi

High level summary

The four open fronts that the crypto team is working on are:

  • MuSig2: We are almost ready to reach a point where the MuSig2 library is ready for usage by the Hydra team.
  • Mithril: We started to think how Mithril-core can be designed such that it can be leverage by contexts where the verifiers run full nodes
  • cardano-base: The VRF and BLS branchs are still open and in progress
  • KES agent: We keep progressing with KES secure forgetting implementation as well as the KES agent.

Low level summary

MuSig2

  • We redesigned the library so that MuSig2 lib users don't need to be aware of the underlying secp256k1 library PR#31
  • We are introducing a more granular error handling mechanism PR#33
  • We rethought the API and made it more consistent with the underlying secp256k1 library PR#35

Mithril

  • The mithril crates in general will be published in crates.io, and we adapted the core library's README PR#616
  • We are modifying the individual signature to not contain the VK and stake. This was not necessary, as the current design requires the aggregator of Mithril certificates to know this information PR#620

cardano-base

  • We've been still working in updating to the latest version of the VRF. In particular we modified the cbits to use the latest version of libsodium stable (1.0.18) PR#341
  • SKs, VKs and VRF outputs will be compatible across the different versions. We are implementing conversion functions for simple transitions PR#344
  • Benchmarks on pairing built-ins have already started, so we were finalising some CI concerns and final remarks on the BLS PR, so that it can be merged as soon as we have green light from plutus PR#266

KES agent

  • We keep progressing in the secure forgetting PR and resolving some bugs on memory handling PR#255
  • Increasing the test framework to make sure concurrency is properly treated by the KES Agent, for which we are including refcounted references. General progress in the implementation.

· One min read
Sebastian Nagel

High-level summary

This week, the Hydra team attended the Cardano Summit in Lausanne, where Sebastian gave a presentation about Hydra and the whole team connected with the Cardano Community. After the public event, th Hydra team also conducted a workshop, which provided room for a retrospective, various planning sessions and they hacked together on different ideas.

What did the team achieve this week

What are the goals of next week

  • Monthly report & review meeting
  • Tie up several loose ends / branches.
  • Resolve Tx validity discussions & PRs.
  • Review cicero PR & try it out.