Skip to main content

59 posts tagged with "sre"

View All Tags

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • A substantial amount of effort was placed into the Hydra CI build system during this biweekly period to investigate the root cause of aborted builds due to both logged invalid store paths and logged missing nar cache files. Nushell scripts were written to examine and repair specific closures as well as to walk all nix cache objects and proactively resolve any dangling narinfo files, effectively resolving the aborted builds. Script repair operations were parallelized to speed up the walk rate across the large object count bucket. The root cause was a cache truncation operation which purged a small percentage of objects filtered by oldest age and non-uniformly deleted narinfo and nar objects which needed to remain paired due to self-references. A more intelligent GC approach will be used in the future.

Repository Work -- Merged

Blockperf

  • Fixes a new tracing system blockperf implementation error for trace detail level. blockperf-pr-33

Capkgs

  • Re-adds regular hydraJob builds in addition to fetch-closure only builds to ensure the full jobset can be rebuilt from source. capkgs-commit-range

Cardano-airgap

  • Adds more boot options for better video driver support, including nouveau nomodeset fallback and open and closed Nvidia drivers. The dconf config file was updated to use the nixos modules declaration. Logout, shutdown, restart and similar gnome operations were fixed. Additional helper packages were added. See the PR header for details. cardano-airgap-pr-9

Devx-ci

  • Adds ci10, a x86_64-linux builder, to be repurposed later for Equinix metal migration. Sets narinfo-cache-positive-ttl back to default value, sets the default user nofile limit to 4096 from default of 1024 to avoid occasional nofile failures. Rekeys required group secrets to include the new machine, adds ci7, ci8 to the r2 tunnel. Adds a github-hydra-bridge-restarter service to detect when the bridge token has expired and auto-rotate within one minute of expiration. devx-ci-pr-135

Repository Work In Progress -- PRs and Branches

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • An on-going, intermittent outage with our nix upstream cache storage provider has been investigated. While the issue still persists and we work with the provider to get it resolved, it appears to be isolated to traffic routing through a particular provider colocation. Installing a wireguard tunnel for our cache traffic to route around the affected colo has brought our build farm machines back to normal operation until the provider resolves the issue.

This biweekly is shorter than usual as SRE members attended Nixcon 2025 to stay sharp on nix skills, relevant technical knowledge and tooling that can benefit our IOE environments and operations. Additionally, the remaining time was skewed towards internal operations rather than feature development during this period. A new team member has also joined the SRE team!

Repository Work -- Merged

Devx-ci

  • Adds an independent pin for GH runner bumps, and adds use of an r2 wg tunnel to eu-central-1 to work around the problematic CF ARN colo reads. devx-ci-pr-134

Repository Work In Progress -- PRs and Branches

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • An Oakhost cloud Mac M4 machine was on-boarded into the mixed Hetzner and prem Darwin CI environment. An internal documentation guide was written for fast Oakhost onboarding if a need to scale Darwin resources further presents itself.

  • Adawallet received a feature upgrade to to handle native tokens in a number of subcommands for adawallet accounts. The cardano-airgap image was updated to include this capability.

  • A new IOHK signing authority key for email signing.authority@iohk.io with GPG key fingerprint ending in 0xCD0171BC90, was placed into production for package signing. The public key can be found here.

Repository Work -- Merged

Adawallet

  • Adds multiasset support to adawallet in select contexts such as send-tx, drain-tx, bulk-drain-tx, migrate-wallet, import-utxos, export-utxos. When the multiasset option is used native tokens are aggregated into a single UTXO with the minimum amount of lovelace required and lovelace only UTXOs are handled separately. A future improvement would be to handle large native tokens collections which may hit transaction size limits. adawallet-pr-24

Cardano-airgap

Devx-ci

  • Adds oak-m4-1 machine to the build cluster with associated monitoring, secrets, hydra ssh and build machine defn. Updates darwin.sh with a -b, --bindir optional arg to allow an easier darwin bootstrapping deployment. Bumps opentofu grafana provider to 4.5.3, makes a common Hetzner linux module, smaller Hetzner linux variant specific modules and refactors the existing Hetzner linux machines under those. Moves hetzner-m1.nix to darwin-state-version.nix module used by both Hetzner and Oakhost darwin machines. Adds a few clarification comments to the distributed builds nixosModule. devx-ci-pr-133

Ops-lib

  • Bumps devShell and machines nix to 2.29.1, bumps deployed machines to nixpkgs 25.05 with a small patch for nixops to continue operating, bumps iohk-nix to current head and niv to latest sources.nix. Cleans up the ssh key overlay and removes sentry relevant package, modules and scripts as usage is deprecated. Makes misc required nixos-25.05 nixpkgs module updates. ops-lib-pr-136

Repository Work In Progress -- PRs and Branches

· 3 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • With the exception of a few canary machines still running the legacy cardano-node tracing system, the majority of IOE playground and mainnet cardano-node machines are now running the new tracing system.

  • Adawallet received a feature upgrade to sign messages, enabling it to complete glacier drop claims for adawallet accounts. The cardano-airgap image was updated to include this capability.

Repository Work -- Merged

Adawallet

  • Bumps to node 10.5.1 for cardano-cli 10.11.0.0, adds cardano-signer v1.29.0 to the devShell, adds a yarn devShell, improves the cardano-hw-cli package by switching to default nodejs pkg and greatly simplifies the build. adawallet-pr-22

  • Add an adawallet sign-msg feature which signs with either payment or stake key, enabling glacier drop claims on adawallet accounts. Cleans up some more legacy cardano-hw-cli packaging and adds back bash auto-completion. adawallet-pr-23

Cardano-airgap

  • Updates adawallet for sign-msg support for glacier drop and adds cardano-signer and misc support packages and services to the devShell and iso. cardano-airgap-pr-6

Cardano-mainnet

  • This PR primarily upgrades all machines to the cardano-node new tracing system. It provides alert, dashboard and nix module upgrades for compatibility with the new tracing system. This PR includes improvements from cardano-parts release v2025-08-05. Additional details can be found in the PR description. cardano-mainnet-pr-38

Cardano-node

  • Adds the snapshot-converter binary to the nix overlay and the node OCI container. Adds documentation on how to use the snapshot-converter within the image for changing ledger state type. cardano-node-pr-6299

Cardano-parts

  • This cardano-parts release changes the default tracing system from legacy to the new cardano-node tracing system for deployed machines. See the release notes for details. cardano-parts-release-v2025-08-05

  • Updates cardano-signer to v1.29.0 which allows for Byron era address claims for Midnight Glacier drop. Bumps mithril unstable and adds some flakeModule cluster options for more service granularity. cardano-parts-release-v2025-08-14

Cardano-playground

  • This PR primarily upgrades all playground testnet machines with a few canary exceptions to the cardano-node new tracing system. It provides alert, dashboard and nix module upgrades for compatibility with the new tracing system. This PR includes improvements from cardano-parts release v2025-08-05. Additional details can be found in the PR description. cardano-playground-pr-45

Cardano-signer (nix packaged)

  • The nix packaging for upstream cardano-signer was updated for release 1.29.0 for byron address glacier drop compatibility and a GHA for ci build testing was added. cardano-signer-pr-2

Repository Work In Progress -- PRs and Branches

· 3 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-node 10.5.1 has now been promoted to a full release for mainnet use and has been deployed to all of IOE's node SRE testnet and mainnet clusters.

  • A new docker OCI tag of 10.5.1-docker is additionally available for any OCI image users who want to use the snapshot-converter binary from within the image. The snapshot converter can be found at path /usr/local/bin/snapshot-converter. New cardano-node OCI release images going forward will contain the snapshot converter to facilitate ledger backend state changes without having to rely on host level tooling or a fully ledger replay.

  • Cardano-parameters is a new repo which maintains the mainnet, preprod and preview protocol parameters with a daily update, as reported by the BlockFrost Cardano API service.

Repository Work -- Merged

Cardano-mainnet

  • Cardano-node release has been updated to 10.5.1. Rotates KES, cleans up 10.5.0 module code no longer needed and deploys select bootstraps with an EgressPollInterval modifier. cardano-mainnet-pr-37

Cardano-monitoring

  • A modernization and security update PR which updates most flake pins, patches grafana for CVE vulnerabilities, updates the opentofu-registry and adds filtering for alpha/beta versions, updates tofu resource declarations and recipes, migrates to SSH over SSM, fixes some race conditions on reboot and more. See the PR header for details. cardano-monitoring-pr-2

Cardano-node

  • For the purposes of a 10.5.1-docker OCI image tag, adds snapshot-converter to the nix overlay and bundles it into the cardano-node OCI image. Adds documentation with example command usage. cardano-node-compare

Cardano-parts

  • Cardano-node pre-release has been updated to 10.5.1. A nix packaged version of cardano-signer has been added, blockperf and credential-manager tools updated. Nix jobs for facilitating governance activities have been improved, easing operations. cardano-parts-release-v2025-07-23

  • Cardano-node release has been updated to 10.5.1. cardano-parts-release-v2025-07-25

Cardano-playground

  • Cardano-node pre-release has been updated to 10.5.1. Adds playground gov action support scripts, preview/preprod committee state changes and relevant gov action artifacts. Includes various improvements with cardano-parts release v2025-07-23. See the PR header for details. cardano-playground-pr-49

  • Cardano-node release has been updated to 10.5.1. Fixes a playground voting script to work with sops decrypt using bash array args; drops some deprecated code and updates cardano-book. cardano-playground-pr-50

Devx-ci

  • Nix-darwin was updated to 25.05 requiring updating of the Darwin guest bootstrap scripts as well as the Buildkite modules. Deprecated Hetzner Darwin materialization for a legacy ssh problem was removed and Hydra impermanence config was updated. devx-ci-compare

Repository Work In Progress -- PRs and Branches