Skip to main content

· 4 min read
Michael Fellinger

High level summary

The SRE team is heavily working on the Equinix Metal migration, replacing Hydra with Cicero, and a new version of Spongix.

Lower level summary

OpenZiti

  • Work is ongoing on our OpenZiti integration into Bitte in [bitte-zt].
  • CI-World deployment of Darwin CI Ziti service in [ci-world-commit-d40f4d].
  • Multiple issues filed, and a lot of discussion with the OpenZiti developers, we're making pretty rapid progress thanks to them.
  • Work on getting Equinix baremetal machines integrated into AWS World Bitte clusters utilizing a Ziti ZTNA network overlay to bridge the networking of the two environments and get IAM extension to Equinix machine for Nomad client onboarding.
  • A Nix Flake for most of our OpenZiti dependencies including the Console, Controller, Edge Tunnel, and Router is now at [openziti-bins].
  • The Flake also includes a WiP NixOS modules for these components.
  • Tested Ziti Desktop Edge official app for Darwin x86_64 w/ GUI -- works with no issues seen so far
  • Moved the console to traefik routing service (zac.$DOMAIN) and controller/edge router stay at zt.$DOMAIN, but have registered consul services

Cicero & Tullia Integrations

Cicero & Tullia Features

  • Improvements to Tullia task aggregation to make [cardano-addresses] build correctly.
  • Better tullia CUE lib default for tags [tullia-commit-4df3c5d].
  • Put cache.nixos.org back in cache.iog.io's upstreams. This is now considered a public cache again, and without it some Cicero evaluations had to build huge packages.
  • Started working on a flake-parts module for Tullia.
  • Started working on cutting down Tullia task build time by putting facts in JSON files.
  • Fixed running into kernel arg limit by reading tullia's DAG from a file
  • Merged [tullia-pull-9] that fixes several issues related to error reporting. and escaping.
  • Added Mac builders in Cicero on CI-World.
  • Started work on Tullia invocation caching.

Spongix

  • A lot of progress on an SQlite backed version of Spongix, it already supports the full HTTP binary cache protocol but still lacks comprehensive testing and some tuning, as well as recursive lookups.
  • First steps in the implementation of the nix-daemon ssh-ng protocol so Spongix can be used via SSH and we can get rid of basic auth.

Bugs

  • Discovered Cicero bug where Nomad reschedules cause the Github commit status to get stuck in pending
  • Discovered Cicero race condition bug around concurrent transactions for codependent actions.
  • Fixed tullia task order bug in [cardano-addresses]
  • Diagnose Cicero action not triggered in [abcirdc]
  • Fixed meta/description of the Tullia package in [tullia-pull-7]
  • Add Vault token loop alerts in [bitte-cells-pull-40]
  • Ongoing investigation on recurring Patroni and nomad-follower issues related to token rotation.

· 2 min read
Iñigo Querejeta Azurmendi

High level overview

The crypto team is primarily focusing in enabling SECP primitives, and preparing the KES agent. We are close to meeting the acceptance criteria in cardano-base, which lacks some editorial comments on the style of dQuandrant's PR, the inclusion of one additional test, and we are good to mark it as done. For the KES agent, we are still iterating over the best design of the solution, but also progressing on the implementation.

Low level overview

SECP built-ins

  • (missed last two weeks update) Audit was succesfully completed by bCryptic, and some minor changes where addressed in PR 313
  • CIP-0049 was addressed in the editors meeting, and PR 250 was merged
  • The unit-tests PR 320 is opened. Some editorial concerns still need to be addressed, and an additional (negative) test has been requested for addition.

KES agent

  • We were working in investigating how to send OpCerts to KES agents, but turns out to be not necessary. OpCerts can be stored on-disk, so the agent does not need to be aware of them.
  • We are redesigning the architecture. Instead of connecting the control server to the agent, and then the latter to the node, we are directly connecting the control server to the node, and the latter to the agent(s).

· One min read
Sebastian Nagel

High level summary

This week, the hydra team reviewed and addressed several open comments on the new HeadV1 specification, completing a list the of identified gaps between specification and implementation while doing so. In the wake of the recent demonstration of SundaeSwap running their DEX in a Hydra Head, they met with them to capture feature ideas & incorporate their feedback on the roadmap, as well as potential research avenues.

What did the team achieve this week

What are the goals of next week

  • Complete the last two items required for a version 0.8.0.
  • Cut the next release, version 0.8.0
  • Get backup/recovery #187 done with proper event sourcing (ADR18)
  • Have the CI build macos artifacts

· 2 min read
Jordan Millar

2022-10-19 - 2022-11-01

High level summary

This sprint saw the addition of the long awaited tx-mempool command that allows user to query the local node's mempool for the following information:

  • Ask the node about the current mempool's capacity and sizes
  • Request the next transaction from the mempool's current list
  • Query if a particular transaction exists in the mempool

Outside of this feature the team has been focused on responding to user requests (e.g exposing functions, types and implementing instances they need) and refactoring cardano-cli/cardano-api. The metric tx_submit_fail_count has been added to the submit api so users can track how many transactions have failed. Other improvements have been made:

  • Documentation improvments
  • Release 1.35.4 was merged & released
  • Exported various types from cardano-api that were requested by community members

Completed

cardano-cli

cardano-api

cardano-submit-api

cardano-node

cardano-testnet

  • None

In Progress

cardano-cli

cardano-api

cardano-node

· 4 min read
Damian Nadales

High-level summary

During the past two weeks, the consensus team worked on adding property test for different aspects of the UTxO HD prototype: era transitions, mempool, and backing store. Thanks to these tests we were able to uncover a bug in the prototype. On the Genesis front, we benchmarked a different version of the ChainSync jumping prototype to try to improve its performance, but this did not result in any noticeable speedup.

High-level status report

  • Finish the UTxO HD prototype: on track.
    • We focused on increasing test coverage for the UTxO-HD prototype:
      • We started implementing Cadano-eras transition property-tests.
      • We started implementing state-machine property-tests for the mempool.
      • We merged the mempool rewrite.
      • We started working on state-machine tests for the backing store. This uncovered a bug in the range-read implementation of the LMDB backing store.
  • Genesis: on track.
    • We benchmarked a version of the Genesis ChainSync Jumping prototype that spreads out the ChainSync updates over a longer period of time. This did not result in any noticeable speedup.
    • We investigated the overhead introduced by non-ChainSync components, but no conclusions could be drawn from the benchmarks we ran.

Workstreams

Finish the UTxO HD prototype

We focused on increasing test coverage for the UTxO HD prototype. We also merged the mempool rewrite.

Era transition property tests

We started implementing Cardano era transition property tests, which are needed for making sure that the ledger tables get updated in the right way when we move from one era to the next. There are at the moment two important transitions.

  • Byron to Shelley: where all the UTxO is transferred from in-memory Byron state (which has no tables) to the ledger tables of the Shelley state.
  • Shelley to Allegra: where the AVVM addresses must be deleted.

We have tests for the Byron to Shelley transitions. We are working on adding the remaining ones.

Mempool state-machine tests

We started implementing state-machine property tests for the mempool. The mempool is currently tested via pure property tests, and use a ledger state without tables. With the introduction of UTxO HD, testing the concurrent behavior of the mempool became of crucial importance (eg now we have to acquire locks to flush the backing store). In addition, we need to test a ledger state with tables. These needs led to the creation of a new set of property tests. In particular we aim to run parallel state-machine tests that exercise the mempool in a way similar to how the node would make use of it.

Backing store property tests

We started working on state-machine tests for the backing store that UTxO HD uses. The property tests uncovered errors in the range-reads implementation of the LMDB backing store. To facilitate fixing this bug, we made changes to the Haskell LMDB bindings.

Benchmarking the CSJ prototype

Prompted by previous benchmarks showing significant improvements in sync time by using more capabilities, we implemented a way to spread out the ChainSync updates over a larger period instead of firing them all at the same time. This didn't result in a noticeable speedup.

We also benchmarked the prototype with CSJ disabled (such that just the dynamo peer is running ChainSync, but e.g. BlockFetch still sees all peers) to rule out/confirm overhead by non-ChainSync (mainly BlockFetch) related components. This results in era-specific behavior (speed is like the prototype in Byron, but like the baseline in Shelley). This deserves a closer look in the future.

This diagram shows the respective syncing progress, starting at Genesis and continuing a good part into Shelley (with the dashed line indicating the Byron-to-Shelley transition).

  • Red: baseline
  • Green: CSJ prototype, 10 peers, jumps every 3000/f slots, jumps in clumps.
  • Blue: like Green, jumps are spread out.
  • Orange: variant with no jumping, to measure unrelated overhead.