Skip to main content

SRE Team Update

· 4 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-parts, cardano-playground and cardano-mainnet were updated with cardano-node 10.6.4, cardano-db-sync release 13.6.0.8, pre-release 13.7.0.2, and nix was patched for security vulnerabilities GHSA-g3g9-5vj6-r3gj / CVE-2026-39860.

  • The ZFS AMI module was enhanced with a configurable percentage-based ARC cache sizing option derived from the node RAM.

  • Buildkite infrastructure was updated to accommodate Daedalus Linux CI support.

  • A van Rossem PV11 cost model governance vote was cast on preview.

Repository Work -- Merged

Cardano-mainnet

cardano-mainnet PR#43:

  • Bumps cardano-node to 10.6.3, and then 10.6.4 with corresponding deployments
  • Bumps cardano-db-sync to 13.6.0.8 and deploys to dbsyncs
  • Adjusts alerts for the remaining block producer to reflect current stake levels
  • Migrates resources out of me-central-1 due to stability issues and into ap-southeast-6
  • Destroys retired block producer machines and secrets
  • Updates webserver and DNS resources to properly serve IOGP metadata for remaining pools that are unused and unfunded but not retired
  • Adds CPU/memory usage panels and totals to cardano-node.json and cardano-node-new-tracing.json Grafana dashboards
  • See the PR description for additional details

Cardano-parts

cardano-parts PR#81:

  • Bumps cardano-node release to 10.6.4, cardano-db-sync release to 13.6.0.8, and cardano-db-sync pre-release to 13.7.0.2
  • Bumps nix to address security vulnerabilities GHSA-g3g9-5vj6-r3gj and CVE-2026-39860
  • Extends the ZFS AMI ami.nix nixosModule with a configurable boot.zfs.zfsArcPct option for percentage-based ARC cache sizing
  • Updates the AWS EC2 spec to include new machine types missing in the existing spec
  • Fixes a race condition in profile-aws-ec2-ephemeral.nix where chown could fail on a disappeared ephemeral file
  • Fixes a tcpTxOpt colmena module breaking change introduced in nixpkgs 25.11
  • Adds CPU/memory usage panels and totals to cardano-node.json and cardano-node-new-tracing.json Grafana dashboards
  • Adds non-NixOS machine handling to consistency-checking and update-ips recipes
  • See the PR description for additional details

Cardano-playground

cardano-playground PR#56:

  • Bumps cardano-node to 10.6.4, cardano-db-sync to 13.6.0.8, and cardano-db-sync pre-release to 13.7.0.2 with deployments to release environments
  • Extends ami.nix with configurable boot.zfs.zfsArcPct option for percentage-based ZFS ARC cache sizing
  • Fixes buildkite NixOS container startup race condition with sops and repurposes a buildkite machine for a Daedalus queue
  • Adds CPU/memory usage panels and totals to Grafana dashboards
  • Updates cardano-book for 10.6.3 and 10.6.4 node releases
  • Casts a governance vote on preview for the van Rossem PV11 cost model update with signed rationale and vote transaction
  • See the PR description for additional details

Devx-ci

devx-ci PR#154:

  • This should bring hydra-tools back up to a sufficiently recent release (hydra-github-bridge to 0.2.1.0), which will make it possible to layer on other fixes on top of it (for example, recovering PostgreSQL hung connections and not crashing while reading build logs).

devx-ci PR#155:

  • Adds 3 types of Oakhost Darwin machines, each with 3 available hydra build slots initially, pending further tuning
  • These machines will likely be short-lived until a new hardware offering from Oakhost is available in a few months
  • Adjusts number of hydra eval worker threads to 3 as 4 tends to cause semi-regular OOMs w/ 4 concurrent large evals