Skip to main content

72 posts tagged with "sre"

View All Tags

SRE Team Update

· 5 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Trace dispatcher was migrated out of cardano-node and into its own repo: hermod-tracing

  • A number of CI improvements to Darwin builder configuration and general CI monitoring and alerting were merged.

  • The cardano-node 10.7.0 pre-release SRE contribution work was completed.

  • Dijkstra network had the van Rossem cost model for PV11 preparation submitted, ratified and enacted.

  • Some cloud resources were relocated to more stable areas after disruption due to conflict in the Middle East.

Repository Work -- Merged

Cardano-monitoring

cardano-monitoring PR#6:

  • Enables Loki alert rule evaluation, and makes the alerts visible in Grafana

Cardano-node

cardano-node PR#6478

  • Bumps iohkNix, updates MinNodeVersion to 10.7.0 and refreshes mainnet-peer-snapshot.json and other ci files.
  • Adds independent lsmDatabasePath NixOS option with uniqueness assertion and mutual-exclusion check against LMDB per instance.
  • Adds kes-agent/kes-agent-control (Linux only) and dmq-node (all platforms) to release binaries.
  • Adds cardano-node-dbtools NixOS test covering db-synthesizer, db-analyser, db-truncater, and the GHC-asserted synthesizer binary against a cardano-testnet create-env environment.
  • Adds --shelley-kes-agent-socket support to run-node and cardano-node-service.nix. Expands the KES assertion to cover three valid forging configurations: relay (none), direct KES key, and KES agent socket.
  • Adds CARDANO_TRACER_SOCKET_NETWORK_{ACCEPT,CONNECT} tracer socket options to run-node.
  • Hardens all node and tracer entrypoint/launch scripts with set -euo pipefail, safe ${VAR:-} expansion throughout, pre-flight file existence checks, and exec for clean process replacement.
  • Consolidates separate relay/block-producer run functions into a single runNode. Derives GENESIS_JSON from CARDANO_CONFIG directory to support non-mainnet deployments.
  • Renames runCommandNoCCLocalrunCommandLocal for nixpkgs 25.11.

Devx-ci

devx-ci PR#145:

  • Upgrades darwin CI infrastructure with version bumps, guest VM lifecycle management, and maintenance improvements.
  • Darwin related infrastructure upgrades:
    • Nix 2.322.33-maintenance (hosts and guests); nix.package now set explicitly
    • nix-darwin 25.0525.11 (guests)
    • UTM versioned per architecture:
      • aarch64-darwin: 4.5.45.0.2
      • x86_64-darwin: pinned to 4.6.5 as UTM > 4.6.5 breaks display driver compatibility with macOS Sequoia+ on Intel
    • ca-derivations experimental feature enabled on hosts and guests
    • Guest bootstrap nix version bumped 2.28.32.32.5
    • Adds a small C binary at /usr/local/bin/nix-daemon-launcher to work around macOS launchd sandbox blocking .dylib loads from the APFS /nix volume
    • darwin.sh gains --system/-s (default: aarch64-darwin) flags; argbash upgraded 2.10.02.11.0
    • Auth-keys-hub is now utilized by the guests and legacy ops-lib usage has been removed
    • A percentage-based threshold garbage collection has been implemented, deriving thresholds from disk size rather than fixed values
  • Bumped nixpkgs-gh-runners v2.330.0v2.332.0
  • See the PR description for additional details

devx-ci PR#146:

  • Deploy hydra-tools hydra-github-bridge 0.2.1.0

devx-ci PR#147:

  • Accommodate the nixos hydra-github-bridge module with an extra secrets file

devx-ci PR#148:

  • Pause the new hydra-github-bridge usage until after pre-release of 10.7.0
  • Remove zramSwap to make more physical RAM available to Hydra
  • Reduce max concurrent evals/jobs down to 4 to keep throughput manageable

devx-ci PR#149:

  • Rekey all secrets to accommodate contributions from another SRE

devx-ci PR#150:

  • Improves alerting infrastructure using Mimir Alertmanager and Loki ruler.
  • Sets up Dead Man's Snitch.
  • Infrastructure changes:
    • Moved OpenTofu configuration from perSystem/packages/opentofuConfig/ to flake/opentofu/ for consistency with other SRE repos
    • Added Mimir provider for Prometheus-style alerting rules
    • Added Loki provider for log-based alerting rules
    • Configured Mimir Alertmanager to route all alerts to PagerDuty
    • Added various metrics based and log based alerts
  • See the PR description for additional details

devx-ci PR#151:

  • Alert on every OOM and setup annotations to show Alertmanager alerts.

Hermod-tracing

hermod-tracing PR#2

  • Adds hydraJobs to flake top level attrs
  • Adds aarch64-linux
  • Tests hydra integration
  • Makes explicit required and nonrequired aggregate jobs
  • Moves default pkgs to trace-dispatcher

Iohk-nix

iohk-nix PR#610:

  • Include updated configs from respun dijkstra net from 2026-02-19
  • Re-add sanchonet back to the available environments since it is persisting as a long lived community test network
  • Update per-environment useLedgerAfterSlot values
  • Update per-environment peer-snapshot.json files
  • Update MinNodeVersion to 10.7.0 as the peer-snapshot files made a version breaking change

Usdcx-infra

usdcx-infra PR#5:

  • Adds missing series eval resolution targets

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Preparation for 10.7.0 pre-release is underway and SRE is working on integrations for kes-agent and dmq-node for the release binaries, node nixos service and OCI containers as appropriate. CI tests for Consensus db-tooling (ie: db-analyser, db-truncater, db-synthesizer) are being added to a nixos test run on Hydra to ensure bundled node version and db-tools version maintain compatibility.

  • Iterative deployments of 10.7.0 pre-release candidates to select pre-release environments are on-going with issues being reported back to developers.

  • Darwin CI build machine updates are underway along with some optimizations and fixes to reduce flaky Darwin platform bugs and noisy alerts as well as a refactor to reduce code complexity. A number of these improvements will appear in the next SRE biweekly update.

  • Loki logging has been added to more of our cardano-parts environments (ie: cardano-playground and cardano-mainnet). Custom Loki dashboards are also being prepared to improve the Loki experience and will appear in the next cardano-parts PR.

Repository Work -- Merged

Cardano-monitoring

cardano-monitoring PR#4:

  • Adds Loki to playground, mainnet and networkteam monitoring servers
  • Raises max_outstanding_per_tenant to accommodate large dashboards w/o errors

cardano-monitoring PR#5:

  • Adjusts Loki log retention to a per-environment setting

Devx-ci

devx-ci PR#144:

  • Increases nofile soft/hard limit to avoid failures on higher nofile requirement builds like virtiofs virtualized images

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • For this biweekly period, SRE has been heavily engaged in support activity for the new USDCx service, now publicly announced here.

  • The Dijkstra network was respun to facilitate new genesis parameter optimizations.

Repository Work -- Merged

Acropolis

acropolis PR#739:

  • Create GitHub Actions for omnibus workflow, including mithril bootstrapping to a target epoch at which point integration tests and a smoke test are run. The workflow is scheduled as well as manually triggerable.

acropolis PR#744:

  • Modify tests to not wait for keyboard input if a CI environment is detected.

acropolis PR#745:

  • Add Cargo cache usage for CI GitHub Actions.

USDCx-infra

master:

  • SRE contributed work to date.

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 3 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-node 10.5.4 has been released and 10.6.2 has been pre-released and deployed to all IOE release and pre-release environments.

  • New arm64 release artifacts and OCI images are now available with node 10.6.2 pre-release.

Repository Work -- Merged

Cardano-mainnet

cardano-mainnet PR#41:

  • Sets cardano-node release to 10.5.4. The private IOG pools, IOGP{2..4} have been retired. All private relays have been stopped as well as the retired block forging pool machines. Includes various improvements with cardano-parts release v2026-02-12. See the PR description for additional details.

Cardano-parts

cardano-parts release v2026-02-12:

  • This release updates cardano-node release to 10.5.4, cardano-node pre-release to 10.6.2, cardano-db-sync pre-release to 13.7.0.0 and cardano-faucet to 10.6. Dijkstra network support has been added to process-compose, recipes and scripts. The demo and demo-ng scripts have been substantially improved for Protocol Version 11 intra-era hard fork support, Constitutional Committee key voting, DRep registration, and constitution adoption. Other miscellaneous improvements and fixes are detailed in the PR description.

Cardano-playground

cardano-playground PR#53:

  • Sets cardano-node release to 10.5.4, cardano-node pre-release to 10.6.2, cardano-db-sync pre-release to 13.7.0.0 and cardano-faucet to 10.6. Dijkstra network support has been added to recipes and scripts. The demo and demo-ng scripts have been substantially improved for Protocol Version 11 intra-era hard fork support, Constitutional Committee key voting, DRep registration, and constitution adoption. Includes various improvements with cardano-parts release v2026-02-12. See the PR description for additional details.

cardano-playground PR#54:

  • Upgrades the cardano environment switching for recipes and scripts to a sourcing method instead of the prior GDB parent shell var modification approach which has become more fragile over time.

Cardano-node

cardano-node PR#6424:

  • Bumps cardano-node cabal version to 10.6.2
  • Bumps iohkNix for 10.6.2 configuration updates (see iohk-nix PR#609 for details)
  • Updates CI files for configuration changed in 10.6.2

cardano-node PR#6425:

  • Bumps iohk-nix for 10.5.4 preview checkpoint file, peer snapshot and ledger updates
  • Adjust CI files

Iohk-nix

iohk-nix PR#609:

  • Adds a preview checkpoint file
  • Adds mempool timeout docs and bumps MinNodeVersion to 10.6.2
  • Adds dijkstra network config
  • Updates per environment peer-snapshot file and useLedgerAfterSlot
  • Fixes mkConfigHtml function html spacing when using multi-line strings
  • Sets mkConfigHtml to use new default tracing for the genesis file inclusion check
  • Switches Dijkstra network to GenesisMode w/ peer-snapshot as default

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • A cardano-node 10.5.4 release and 10.6.2 pre-release version are quite close. SRE has been iterating on candidate deployments for these versions to our infrastructure and offering feedback to relevant development teams where appropriate.

Repository Work -- Merged

Cardano-airgap

cardano-airgap PR#12:

  • Bumps disko to v1.13.0
  • Fixes running disko scripts in the iso environment after bumping nixpkgs by adding required deps via closure path reference to /etc/install-closure.
  • Adds binary iso-versioning, available from devShell and in the iso environment which provides relevant component version info in centered GitHub md table format.

Cardano-faucet

cardano-faucet PR#17:

  • Updates the flake inputs and haskell dependencies of cardano-faucet. Specifically, upgrade cardano-api to 10.19 for cardano-node 10.6.x compatibility, and fix the resulting compiler errors. This prepares the faucet for functioning through the upcoming intra-era hard fork.

Iohk-nix

cardano-node-release/10.5.4 branch:

  • Provides a preview environment checkpoint file as well as per-environment peer snapshot file and "use ledger peers after" slot updates for cardano-node release 10.5.4.

Repository Work In Progress -- PRs and Branches