Skip to main content

SRE Team Update

· 4 min read
Michael Fellinger

High level summary

The SRE team is heavily working on the Equinix Metal migration, replacing Hydra with Cicero, and a new version of Spongix.

Lower level summary

OpenZiti

  • Work is ongoing on our OpenZiti integration into Bitte in [bitte-zt].
  • CI-World deployment of Darwin CI Ziti service in [ci-world-commit-d40f4d].
  • Multiple issues filed, and a lot of discussion with the OpenZiti developers, we're making pretty rapid progress thanks to them.
  • Work on getting Equinix baremetal machines integrated into AWS World Bitte clusters utilizing a Ziti ZTNA network overlay to bridge the networking of the two environments and get IAM extension to Equinix machine for Nomad client onboarding.
  • A Nix Flake for most of our OpenZiti dependencies including the Console, Controller, Edge Tunnel, and Router is now at [openziti-bins].
  • The Flake also includes a WiP NixOS modules for these components.
  • Tested Ziti Desktop Edge official app for Darwin x86_64 w/ GUI -- works with no issues seen so far
  • Moved the console to traefik routing service (zac.$DOMAIN) and controller/edge router stay at zt.$DOMAIN, but have registered consul services

Cicero & Tullia Integrations

Cicero & Tullia Features

  • Improvements to Tullia task aggregation to make [cardano-addresses] build correctly.
  • Better tullia CUE lib default for tags [tullia-commit-4df3c5d].
  • Put cache.nixos.org back in cache.iog.io's upstreams. This is now considered a public cache again, and without it some Cicero evaluations had to build huge packages.
  • Started working on a flake-parts module for Tullia.
  • Started working on cutting down Tullia task build time by putting facts in JSON files.
  • Fixed running into kernel arg limit by reading tullia's DAG from a file
  • Merged [tullia-pull-9] that fixes several issues related to error reporting. and escaping.
  • Added Mac builders in Cicero on CI-World.
  • Started work on Tullia invocation caching.

Spongix

  • A lot of progress on an SQlite backed version of Spongix, it already supports the full HTTP binary cache protocol but still lacks comprehensive testing and some tuning, as well as recursive lookups.
  • First steps in the implementation of the nix-daemon ssh-ng protocol so Spongix can be used via SSH and we can get rid of basic auth.

Bugs

  • Discovered Cicero bug where Nomad reschedules cause the Github commit status to get stuck in pending
  • Discovered Cicero race condition bug around concurrent transactions for codependent actions.
  • Fixed tullia task order bug in [cardano-addresses]
  • Diagnose Cicero action not triggered in [abcirdc]
  • Fixed meta/description of the Tullia package in [tullia-pull-7]
  • Add Vault token loop alerts in [bitte-cells-pull-40]
  • Ongoing investigation on recurring Patroni and nomad-follower issues related to token rotation.