Skip to main content

SRE Team Update

· 4 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-parts and cardano-playground were updated with cardano-node 10.6.2, cardano-node pre-release 10.7.0, nixpkgs 25.11, ZFS AMI support, new Loki log dashboards, and extensive monitoring improvements including per-machine absent metrics alerting and mempool timeout alerts.

  • The dijkstra network was fully respun with updated secrets, configs, and a Van Rossem PV11 cost model governance action prepared.

  • CloudFormation stack hardening was applied: dedicated S3 server access logs bucket, TLS-only bucket policies, DynamoDB deletion protection with PITR, and KMS encryption.

  • Ouroboros-network-ops was brought up to a recent cardano-parts release with new resource tagging for CloudFormation and OpenTofu resources.

Repository Work -- Merged

Cardano-airgap

cardano-airgap PR#13:

  • Adds midnight-cli to the air-gapped signing toolset

Cardano-mainnet

cardano-mainnet PR#42:

  • Deploys all nodes to 10.6.2, and all dbsyncs to 13.6.0.7
  • Upgrades nixpkgs to 25.11 and nix to 2.33-maint
  • Adds bootstrap OpenTofu environment and ZFS AMI NixOS module support
  • Adds Loki log shipping with four new log dashboards; removes superseded node-exporter Loki dashboard
  • Adds per-machine machine_metrics_absent alert, tx mempool timeout alerts, and tightened blockHeight threshold
  • Hardens CloudFormation stack with TLS-only policies, DynamoDB deletion protection and PITR, and KMS encryption
  • Rotates the mainnet pool KES keys
  • See the PR description for additional details

Cardano-parts

cardano-parts PR#79:

  • Bumps cardano-node release to 10.6.2, pre-release to 10.7.0, cardano-db-sync release to 13.6.0.7, pre-release to 13.7.0.1, and other component updates
  • Bumps nixpkgs to 25.11 and nix to 2.33-maint with required compatibility fixes
  • Introduces ZFS AMI support via a new ami.nix nixosModule with tank/{root,nix,home,state} dataset layout and new bootstrap OpenTofu environment
  • Removes the deprecated Grafana Agent (EOL 2025-11-01), migrating fully to Grafana Alloy with Loki log shipping support
  • Adds four new Loki log dashboards: cardano-node-logs.json, cardano-node-logs-json.json, systemd-logs.json, and systemd-logs-json.json
  • Adds per-machine machine_metrics_absent alert with multi-offset detection; adds tx mempool timeout alerts; tightens blockHeight unchanged alert from 10 to 7 minutes
  • Hardens CloudFormation stack: dedicated S3 server access logs bucket, TLS-only bucket policies, DynamoDB deletion protection with PITR, and KMS encryption
  • Adds Van Rossem PV11 cost model JSON to template cost-models
  • Restructures cardano-node.json dashboard with mempool timeout panels, instance filtering, and restart/version-change annotations
  • Re-adds sanchonet support to process-compose stacks and template scripts
  • See the PR description for additional details

Cardano-playground

cardano-playground PR#55:

  • Sets cardano-node release to 10.6.2, pre-release to 10.7.0, cardano-db-sync to 13.6.0.7, pre-release to 13.7.0.1
  • Upgrades nixpkgs to 25.11 and nix to 2.33-maint
  • Adds bootstrap OpenTofu environment and ZFS AMI NixOS module support
  • Adds Loki log shipping with four new log dashboards; removes superseded node-exporter Loki dashboard
  • Adds per-machine machine_metrics_absent alert, tx mempool timeout alerts, and tightened blockHeight threshold
  • Creates dijkstra respin with new secrets, updated network configs, and Van Rossem PV11 cost model governance action
  • Converts preview3-bp-c-1 and mainnet1-rel-a-3 to LSM storage backend
  • Hardens CloudFormation stack with TLS-only policies, DynamoDB deletion protection and PITR, and KMS encryption
  • Large colmena cleanup: group-based import system, removes metrics-scraper module
  • Re-integrates sanchonet via upstream iohk-nix
  • See the PR description for additional details

Ouroboros-network-ops

ouroboros-network-ops PR#30:

  • Bumps cardano-parts from v2025-06-24 to post-v2025-08-14
  • Adds new resource tags to CloudFormation and OpenTofu resources: owner, project, costCenter
  • Updates pre-existing organization and environment tags
  • Applies breaking change updates from cardano-parts release

Devx-ci

devx-ci PR#152:

  • Bumps nix in linux and darwin hosts and guests to resolve: GHSA-g3g9-5vj6-r3gj / CVE-2026-39860
  • Also bumps the darwin guest bootstrap nixpkgs version in apply.sh from 25.05 to 25.11