SRE Team Update
· 4 min read
High level summary
The SRE team continues work on Cardano environment improvements and general maintenance.
Some notable recent changes, updates or improvements include:
-
Cardano-parts, cardano-playground and cardano-mainnet were updated with cardano-node
10.6.4, cardano-db-sync release13.6.0.8, pre-release13.7.0.2, and nix was patched for security vulnerabilitiesGHSA-g3g9-5vj6-r3gj/CVE-2026-39860. -
The ZFS AMI module was enhanced with a configurable percentage-based ARC cache sizing option derived from the node RAM.
-
Buildkite infrastructure was updated to accommodate Daedalus Linux CI support.
-
A van Rossem PV11 cost model governance vote was cast on preview.
Repository Work -- Merged
Cardano-mainnet
- Bumps cardano-node to
10.6.3, and then10.6.4with corresponding deployments - Bumps cardano-db-sync to
13.6.0.8and deploys to dbsyncs - Adjusts alerts for the remaining block producer to reflect current stake levels
- Migrates resources out of
me-central-1due to stability issues and intoap-southeast-6 - Destroys retired block producer machines and secrets
- Updates webserver and DNS resources to properly serve IOGP metadata for remaining pools that are unused and unfunded but not retired
- Adds CPU/memory usage panels and totals to
cardano-node.jsonandcardano-node-new-tracing.jsonGrafana dashboards - See the PR description for additional details
Cardano-parts
- Bumps cardano-node release to
10.6.4, cardano-db-sync release to13.6.0.8, and cardano-db-sync pre-release to13.7.0.2 - Bumps nix to address security vulnerabilities
GHSA-g3g9-5vj6-r3gjandCVE-2026-39860 - Extends the ZFS AMI
ami.nixnixosModule with a configurableboot.zfs.zfsArcPctoption for percentage-based ARC cache sizing - Updates the AWS EC2 spec to include new machine types missing in the existing spec
- Fixes a race condition in
profile-aws-ec2-ephemeral.nixwhere chown could fail on a disappeared ephemeral file - Fixes a
tcpTxOptcolmena module breaking change introduced in nixpkgs25.11 - Adds CPU/memory usage panels and totals to
cardano-node.jsonandcardano-node-new-tracing.jsonGrafana dashboards - Adds non-NixOS machine handling to consistency-checking and update-ips recipes
- See the PR description for additional details
Cardano-playground
- Bumps cardano-node to
10.6.4, cardano-db-sync to13.6.0.8, and cardano-db-sync pre-release to13.7.0.2with deployments to release environments - Extends
ami.nixwith configurableboot.zfs.zfsArcPctoption for percentage-based ZFS ARC cache sizing - Fixes buildkite NixOS container startup race condition with sops and repurposes a buildkite machine for a Daedalus queue
- Adds CPU/memory usage panels and totals to Grafana dashboards
- Updates cardano-book for
10.6.3and10.6.4node releases - Casts a governance vote on preview for the van Rossem PV11 cost model update with signed rationale and vote transaction
- See the PR description for additional details
Devx-ci
- This should bring hydra-tools back up to a sufficiently recent release (hydra-github-bridge to 0.2.1.0), which will make it possible to layer on other fixes on top of it (for example, recovering PostgreSQL hung connections and not crashing while reading build logs).
- Adds 3 types of Oakhost Darwin machines, each with 3 available hydra build slots initially, pending further tuning
- These machines will likely be short-lived until a new hardware offering from Oakhost is available in a few months
- Adjusts number of hydra eval worker threads to 3 as 4 tends to cause semi-regular OOMs w/ 4 concurrent large evals
