SRE Team Update
· 5 min read
High level summary
The SRE team continues work on Cardano environment improvements and general maintenance.
Some notable recent changes, updates or improvements include:
-
Trace dispatcher was migrated out of cardano-node and into its own repo: hermod-tracing
-
A number of CI improvements to Darwin builder configuration and general CI monitoring and alerting were merged.
-
The cardano-node
10.7.0pre-release SRE contribution work was completed. -
Dijkstra network had the van Rossem cost model for PV11 preparation submitted, ratified and enacted.
-
Some cloud resources were relocated to more stable areas after disruption due to conflict in the Middle East.
Repository Work -- Merged
Cardano-monitoring
- Enables Loki alert rule evaluation, and makes the alerts visible in Grafana
Cardano-node
- Bumps iohkNix, updates
MinNodeVersionto10.7.0and refreshesmainnet-peer-snapshot.jsonand other ci files. - Adds independent
lsmDatabasePathNixOS option with uniqueness assertion and mutual-exclusion check against LMDB per instance. - Adds
kes-agent/kes-agent-control(Linux only) anddmq-node(all platforms) to release binaries. - Adds cardano-node-dbtools NixOS test covering
db-synthesizer,db-analyser,db-truncater, and the GHC-asserted synthesizer binary against acardano-testnetcreate-envenvironment. - Adds
--shelley-kes-agent-socketsupport torun-nodeandcardano-node-service.nix. Expands the KES assertion to cover three valid forging configurations: relay (none), direct KES key, and KES agent socket. - Adds
CARDANO_TRACER_SOCKET_NETWORK_{ACCEPT,CONNECT}tracer socket options torun-node. - Hardens all node and tracer entrypoint/launch scripts with
set -euo pipefail, safe${VAR:-}expansion throughout, pre-flight file existence checks, and exec for clean process replacement. - Consolidates separate relay/block-producer run functions into a single
runNode. DerivesGENESIS_JSONfromCARDANO_CONFIGdirectory to support non-mainnet deployments. - Renames
runCommandNoCCLocal→runCommandLocalfor nixpkgs25.11.
Devx-ci
- Upgrades darwin CI infrastructure with version bumps, guest VM lifecycle management, and maintenance improvements.
- Darwin related infrastructure upgrades:
- Nix
2.32→2.33-maintenance(hosts and guests); nix.package now set explicitly - nix-darwin
25.05→25.11(guests) - UTM versioned per architecture:
aarch64-darwin:4.5.4→5.0.2x86_64-darwin: pinned to4.6.5as UTM >4.6.5breaks display driver compatibility with macOS Sequoia+ on Intel
- ca-derivations experimental feature enabled on hosts and guests
- Guest bootstrap nix version bumped
2.28.3→2.32.5 - Adds a small C binary at
/usr/local/bin/nix-daemon-launcherto work around macOS launchd sandbox blocking .dylib loads from the APFS/nixvolume - darwin.sh gains
--system/-s(default: aarch64-darwin) flags; argbash upgraded2.10.0→2.11.0 - Auth-keys-hub is now utilized by the guests and legacy ops-lib usage has been removed
- A percentage-based threshold garbage collection has been implemented, deriving thresholds from disk size rather than fixed values
- Nix
- Bumped nixpkgs-gh-runners
v2.330.0→v2.332.0 - See the PR description for additional details
- Deploy hydra-tools
hydra-github-bridge0.2.1.0
- Accommodate the nixos
hydra-github-bridgemodule with an extra secrets file
- Pause the new
hydra-github-bridgeusage until after pre-release of10.7.0 - Remove zramSwap to make more physical RAM available to Hydra
- Reduce max concurrent evals/jobs down to 4 to keep throughput manageable
- Rekey all secrets to accommodate contributions from another SRE
- Improves alerting infrastructure using Mimir Alertmanager and Loki ruler.
- Sets up Dead Man's Snitch.
- Infrastructure changes:
- Moved OpenTofu configuration from
perSystem/packages/opentofuConfig/toflake/opentofu/for consistency with other SRE repos - Added Mimir provider for Prometheus-style alerting rules
- Added Loki provider for log-based alerting rules
- Configured Mimir Alertmanager to route all alerts to PagerDuty
- Added various metrics based and log based alerts
- Moved OpenTofu configuration from
- See the PR description for additional details
- Alert on every OOM and setup annotations to show Alertmanager alerts.
Hermod-tracing
- Adds hydraJobs to flake top level attrs
- Adds
aarch64-linux - Tests hydra integration
- Makes explicit required and nonrequired aggregate jobs
- Moves default pkgs to trace-dispatcher
Iohk-nix
- Include updated configs from respun dijkstra net from
2026-02-19 - Re-add
sanchonetback to the available environments since it is persisting as a long lived community test network - Update per-environment
useLedgerAfterSlotvalues - Update per-environment
peer-snapshot.jsonfiles - Update MinNodeVersion to
10.7.0as the peer-snapshot files made a version breaking change
Usdcx-infra
- Adds missing series eval resolution targets
Repository Work In Progress -- PRs and Branches
- Cardano-mainnet: https://github.com/input-output-hk/cardano-mainnet/pull/42
- Cardano-node: https://github.com/IntersectMBO/cardano-node/pull/6410
- Cardano-parts: https://github.com/input-output-hk/cardano-parts/pull/79
- Cardano-playground: https://github.com/input-output-hk/cardano-playground/pull/55
- Devx-ci: https://github.com/input-output-hk/devx-ci/pull/143
