Skip to main content

· One min read
Sebastian Nagel

High level summary

This week, the hydra team completed several user experience improvements to the hydra-tui and hydra-node, and delivered a first version of persisted head states by publishing release version 0.8.0. Besides this, they met with researchers on topic of the HeadV1 specification and kicked-off work on the RFP for an external audit of the Hydra Head protocol and implementation.

What did the team achieve this week

  • Completed the UX improvements on the hydra-tui
  • Released version 0.8.0, which delivers a first version of persisted head states
  • Met with researchers on the HeadV1 specification
  • Started work on the RFP for our external audit

What are the goals of next week

  • Complete ADR18 implementation and get it merged
  • Start work on event-sourced persistence #580
  • Have a first plutus script gap closed #452
  • Revamp CI to use flakes and build macos artifacts (stretch goal: migrate to cicero for nix builds)

· 3 min read
Marcin Szamotulski

High-level summary

The team has focused on debuging & fixing bugs for the P2P single relay release, which included

  • diagnosing, fixing and writing tests for a bug in peer-state-actions which fortunately hasn't been released;
  • diagnosing & preventing misconfiguration of DNS

We also focused on developing peer sharing. We also held a session with the scientists on eclipse evasion.

Detailed description

P2P Network Stack

During the past two weeks the team focused on p2p single relay release and peer sharing. We found and fixed an important bug recently introduced in one of the components of p2p networking stack (fortunately never released). Together with a fix, we designed a unit test diffusion simulation as well as quickcheck property test (both could reproduce it). We also changed the code in a way that if such a bug is reintroduced in the future, it will be obvious to diagnose. For more see:

Initial benchmarking run of the P2P code was executed. The results where unlike what we see on the mainnet. We found a possible misconfiguration of the cluster (caused by 0 TTL on domain names), which could be the direct cause of it. We wrote a PR which rules out such misconfiguration. We are awaiting on the next benchmarking results. See more at:

ouroboros-network#4106

We also started working on P2P single relay release. The PR ouroboros-network#4120 includes 108 patches cherry-picked from the master branch. We started working toward integration these changes against the release branch of cardano-node. Early next week we ought to be able to have an early version of cardano-node with non experimental P2P support!

For more detailed release plan please see P2P - Single Relay issue.

Consensus

We identified and fixed missing error reporting in consensus initialisation phase. See more at ouroboros-network#4015

Cardano Node

We also made changes in cardano-node in order to give better experience for node operators. This includes updating severities of some of the traces as well as implementing new format of the p2p topology file. For more see:

Peer Sharing

We continued working on implementation of peer sharing. We have an early implementation which will be reviewed and analysed in next weeks. We started working on cardano-node integration. We need PR #4392 to be merged before such integration will be able to land in cardano-node, although this is not blocking us currently. See more at:

Eclipse Evasion

We held a session which included Alexander Russel, Sandro Coretti-Drayton and Nick Frisby from the consensus team. We discussed high lever design of the eclipse evasion scheme, which is important for the design and implementation of ouroboros-genesis. We got a positive feedback from the researchers.

IO-Sim

In this period we made little progress towards releasing IO-Sim on Hackage. A single PR which added a few missing instances of the STM monad.

Open Source

We made sure the CI runs for PRs which comes from forks (which is important to accept contributions from 3rd parties).

Mithril Cardano Integration

We held initial discussions with Arnaud Bailly about possible path to integrate mithril to cardano-node and take advantage of the ouroboros-network diffusion layer.

· 2 min read
Serge Kosyrev

High level summary

On the performance side, the team ran benchmarks for the the P2P feature and the 1.35.4 release. We finished a prototype for performance data publishing. We almost finished the local deployment backend for the workbench using the new SRE deployment infra. We worked on fixing and improving our data analysis pipeline.

On the tracing side, the team worked on isolating a critical issue causing message loss in the remote tracing backend. The issue was resolved and we now have proper end-to-end coverage for the scenario.

Executive summary

  • The new tracing system public release is getting closer, as we're resolving remaining rough edges that are discovered in full-scale deployments. The local benchmarks we ran were already showing improvement relative to legacy tracing, so we expect similar results at full scale.
  • The first (local deployment) iteration of benchmarking adopting the new SRE deployment infra is nearly done. We thank Michael Fellinger and Robin Stumm for their assistance. Two further phases remain: CI integration and cloud deployment.
  • The benchmarking data publishing prototype is ready. This serves as a springboard for both opening our performance assessment workflow (to support the wider Cardano developer community), and for data provision to the business community. Our next steps are to secure a permanent deployment for this mechanism and to integrate it into the benchmarking infrastructure. This requires collaboration with SRE.

· 4 min read
Michael Fellinger

High level summary

The SRE team is heavily working on the Equinix Metal migration, replacing Hydra with Cicero, and a new version of Spongix.

Lower level summary

OpenZiti

  • Work is ongoing on our OpenZiti integration into Bitte in [bitte-zt].
  • CI-World deployment of Darwin CI Ziti service in [ci-world-commit-d40f4d].
  • Multiple issues filed, and a lot of discussion with the OpenZiti developers, we're making pretty rapid progress thanks to them.
  • Work on getting Equinix baremetal machines integrated into AWS World Bitte clusters utilizing a Ziti ZTNA network overlay to bridge the networking of the two environments and get IAM extension to Equinix machine for Nomad client onboarding.
  • A Nix Flake for most of our OpenZiti dependencies including the Console, Controller, Edge Tunnel, and Router is now at [openziti-bins].
  • The Flake also includes a WiP NixOS modules for these components.
  • Tested Ziti Desktop Edge official app for Darwin x86_64 w/ GUI -- works with no issues seen so far
  • Moved the console to traefik routing service (zac.$DOMAIN) and controller/edge router stay at zt.$DOMAIN, but have registered consul services

Cicero & Tullia Integrations

Cicero & Tullia Features

  • Improvements to Tullia task aggregation to make [cardano-addresses] build correctly.
  • Better tullia CUE lib default for tags [tullia-commit-4df3c5d].
  • Put cache.nixos.org back in cache.iog.io's upstreams. This is now considered a public cache again, and without it some Cicero evaluations had to build huge packages.
  • Started working on a flake-parts module for Tullia.
  • Started working on cutting down Tullia task build time by putting facts in JSON files.
  • Fixed running into kernel arg limit by reading tullia's DAG from a file
  • Merged [tullia-pull-9] that fixes several issues related to error reporting. and escaping.
  • Added Mac builders in Cicero on CI-World.
  • Started work on Tullia invocation caching.

Spongix

  • A lot of progress on an SQlite backed version of Spongix, it already supports the full HTTP binary cache protocol but still lacks comprehensive testing and some tuning, as well as recursive lookups.
  • First steps in the implementation of the nix-daemon ssh-ng protocol so Spongix can be used via SSH and we can get rid of basic auth.

Bugs

  • Discovered Cicero bug where Nomad reschedules cause the Github commit status to get stuck in pending
  • Discovered Cicero race condition bug around concurrent transactions for codependent actions.
  • Fixed tullia task order bug in [cardano-addresses]
  • Diagnose Cicero action not triggered in [abcirdc]
  • Fixed meta/description of the Tullia package in [tullia-pull-7]
  • Add Vault token loop alerts in [bitte-cells-pull-40]
  • Ongoing investigation on recurring Patroni and nomad-follower issues related to token rotation.

· 2 min read
Iñigo Querejeta Azurmendi

High level overview

The crypto team is primarily focusing in enabling SECP primitives, and preparing the KES agent. We are close to meeting the acceptance criteria in cardano-base, which lacks some editorial comments on the style of dQuandrant's PR, the inclusion of one additional test, and we are good to mark it as done. For the KES agent, we are still iterating over the best design of the solution, but also progressing on the implementation.

Low level overview

SECP built-ins

  • (missed last two weeks update) Audit was succesfully completed by bCryptic, and some minor changes where addressed in PR 313
  • CIP-0049 was addressed in the editors meeting, and PR 250 was merged
  • The unit-tests PR 320 is opened. Some editorial concerns still need to be addressed, and an additional (negative) test has been requested for addition.

KES agent

  • We were working in investigating how to send OpCerts to KES agents, but turns out to be not necessary. OpCerts can be stored on-disk, so the agent does not need to be aware of them.
  • We are redesigning the architecture. Instead of connecting the control server to the agent, and then the latter to the node, we are directly connecting the control server to the node, and the latter to the agent(s).