Skip to main content

68 posts tagged with "sre"

View All Tags

· 4 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-node 10.6.0 has been pre-released with the corresponding long running SRE PRs now merged into this release! See the release notes for details.

  • SRE team identified a ledger replay bug in the 10.6.0 release candidate whereby the legacy tracing system would no longer log ledger replay update statistics. A fix was implemented prior to tagging and pre-releasing.

  • Near the end of this biweekly reporting period, the preview network experienced a network partition. After an intense multi-team and community collaborative effort which the SRE team was participating in from the beginning, the bug causing the partition was identified and a new cardano-node version 10.5.2 was released to fix the issue. This version was deployed to the IOE preview machines immediately upon release and shortly afterwards to the rest of the IOE testnet and mainnet infra.

Repository Work -- Merged

Cardano-mainnet

  • Bump cardano-parts for v2025-11-18

  • Updated CloudFormation terraformState.nix and opentofu/cluster.nix for corresponding tagging updates

  • Added new required flake cluster attribute declaration for new required resource tagging

  • Added a matomo nix module prototype in prep for a legacy bitte cluster matomo migration to prod

  • Fixed script breakages caused by cardano-cli breaking changes

  • Adds a smash delisting

  • Rotates mainnet KES

  • Adjusts an alert for pool 1 infrequent forging threshold noise

    cardano-mainnet-pr-39

Cardano-parts

  • Bumps cardano-node pre-release to 10.6.0, mithril to 2543.1-hotfix and blockperf to a fix branch which includes a patch for the new tracing system proper blockperf configuration

  • Added rsync ssm help bash function and alias to the common machine profile

  • Added peer snapshot files to the ops library function generateStaticHTMLConfigs

  • Added new flakeModule cluster.nix options of infra.generic costCenter, owner and project

  • Added zsh devShell command completion

  • Updated a number of nixosModules to support both new and legacy tracing systems as well as 10.6.0 and 10.5.1 configuration differences

  • Updated template CloudFormation terraform state and opentofu cluster resource definitions for corresponding tagging updates

  • Fixed template script breakages caused by cardano-cli breaking changes included in the 10.6.0 pre-release

  • Fixed the profile-blockperf.nix nixosModule new tracing system configuration

    cardano-parts-release-v2025-11-18

Cardano-playground

  • Added book config updates for 10.6.0 pre-release environments: preprod, preview

  • Added sanchonet environment configs for community disaster test participation

  • Added a "New Pool" document explainer at docs/explain/new-pool.md

  • Added new required flake cluster attribute declaration for new required resource tagging

  • Added a matomo nix module prototype in prep for a legacy bitte cluster matomo migration to prod

  • Added wireguard tunnel endpoints as temporary R2 colo http streaming/timeout bucket workarounds

  • Added misc improvements to playground scripts for governance voting

  • Updated CloudFormation terraform state and opentofu cluster resource declarations for corresponding tagging updates

  • Updated CI to a smaller representative machine subset

  • Updated preview, preprod and non-prod test forgers with KES rotation

  • Fixed script breakages caused by cardano-cli breaking changes

  • Voted on a preview and preprod governance action with drep/pools and CC

    cardano-playground-pr-51

Iohk-nix

  • Merges non-forger and forger configs, with node handling differences internally based on forger status (ie: PeerSharing, TargetNumberOfKnownPeers, TargetNumberOfRootPeers)

  • Includes peerSnapshotFile for all networks, now at v2

  • Allows SRV records for bootstrap resource definitions

  • Adjust default networking mode to p2p without explicit declaration as the only mode for >= 10.6.0 is p2p

  • Bump minNodeVersion to 10.6.0 for default config changes

  • Testnet templates have been adjusted for plutus v3 cost model params at a mainnet matching 251 parameters and Dijkstra genesis added

    iohk-nix-pr-602

Repository Work In Progress -- PRs and Branches

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • A new cloud tagging resource effort is underway to better attribute and track project expenses, implemented largely via infrastructure-as-code updates with appropriate secrets handling.

  • As part of the 10.6.0 pre-release candidate SRE to dev feedback loop, a memory leak was identified in the pre-release candidate when LMDB ledger backend was used. Further testing, deployment and debugging work helped identify root cause and a fix was implemented.

  • SRE team facilitated the passage of a governance action on the preprod network, likely to be followed by a mainnet proposal and full community vote.

  • Aarch64 cardano-node nix builds, binary release artifacts and OCI containers will be added to the cardano-node repo in the near future.

  • SRE team is providing support to the Midnight Scavenger Mine project which is now live.

Repository Work -- Merged

Cardano-monitoring

  • Adds new resource tags to both CloudFormation and tofu resources: owner, project, costCenter

  • Updates the pre-existing organization and environment tags

  • Treats costCenter tag as secret and adds a corresponding secrets file

  • Updates justfile, CloudFormation state template and tofu cluster definition files to accommodate

  • Adds some default resource tofu defns in prep for ipv6 and default resource tagging

    cardano-monitoring-pr-3

Cardano-perf

  • Adds new resource tags to both CloudFormation and tofu resources: owner, project, costCenter

  • Updates the pre-existing organization and environment tags

  • Treats costCenter tag as secret and adds a corresponding secrets file

  • Updates justfile, CloudFormation state template and tofu cluster definition files to accommodate

  • Adds some default resource tofu defns in prep for ipv6 and default resource tagging

    cardano-perf-pr-6

Ouroboros-network-ops (Ready to merge -- waiting to align with next cardano-parts release)

  • Parts was bumped from v2025-06-24 to post-v2025-08-14 release at next-2025-08-14

  • Updates for breaking changes were applied.

  • Adds new resource tags to both CloudFormation and tofu resources: owner, project, costCenter

  • Updates the pre-existing organization and environment tags

  • Treats costCenter tag as secret and adds a corresponding secrets file

  • Updates justfile, CloudFormation state template and tofu cluster definition files to accommodate

    ouroboros-network-ops-pr-30

Repository Work In Progress -- PRs and Branches

· 3 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • IOE participated in a community driven Sanchonet network chain disaster recovery test event to purposely break Sanchonet and recover by mechanisms and tools discussed in CIP-0135 including db-truncater and db-synthesizer.

  • Faster test deployment iteration of a 10.6.0 pre-release candidate is underway to some preview and preprod testnet machine deployments with a tight feedback loop between SRE and dev team for observations and debugging when issues are found.

  • A new cardano-tracer OCI container will be provided with the upcoming 10.6.0 pre-release now that the new tracing system will be default.

  • SRE team facilitated the passage of a governance action on the preview network, soon to be submitted to preprod and likely followed by a mainnet proposal and full community vote.

  • SRE team has begun providing some additional support to the Midnight Scavenger Mine project as needed until the Scavenger Mine phase completes.

Repository Work -- Merged

Cardano-node

  • This PR includes various SRE related changes for 10.6.0 pre-release readiness:

    • Bumps iohk-nix for config updates from iohk-nix PR#602
    • Fixes configuration change related CI checks.
    • Merges bp and non-bp configurations into a single config whereby ouroboros-network now automatically determines PeerSharing and Target* parameters which previously required being explicitly declared.
    • The new tracing system is now set as the default configuration; the legacy tracing system config is still made available.
    • The mainnet default topology configuration now includes a peerSnapshot declaration for making testing of GenesisMode more convenient.
    • Adjusts OCI containers for the new config setups and also includes peer-sharing configs for each network
    • Updates the nixos cardano-node service for the deprecation of useNewTopology given P2P is now the only networking mode as of 10.6.0.
    • Updates the nixos cardano-node service for new tracerSocketNetworkAccept and tracerSocketNetworkConnect cardano-tracer connection options.
    • Updates the nixos cardano-node service to support SRV peer records.
    • Updates the nixos cardano-tracer service for option name changes of acceptingSocket to acceptAt and connectingToSocket to connectTo; related workbench services were also updated accordingly.
    • For the binary releases, the cardano-submit-api config and peer-sharing config was added.
    • The default cardano-submit-api config was made compatible with the new tracing system.

    cardano-node-pr-6300

Repository Work In Progress -- PRs and Branches

· One min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • A sanchonet relay and pool have been spun up to participate in the community driven disaster recovery testing happening in the near future.

  • Cardano-submit-api configs will be updated to be compatible with both the legacy and new tracing system in the next cardano-node release.

  • A legacy Matomo deployment is being migrated to a newer stack so deprecated resources can be turned off in the near future.

Repository Work -- Merged

Capkgs

  • Adds an exclusions.json file to exclude packages that are known to not build, with compatibility code added to packages.cr and a new justfile recipe: filter-packages. The exclusions file was populated with all currenctly failing evals along with a reason. Go package exclusions were re-added after the missing hydra go deps were re-cached after working around a failing test certificate during haskell.nix bootstrap tests. capkgs-pr-7

Repository Work In Progress -- PRs and Branches

· 2 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • A substantial amount of effort was placed into the Hydra CI build system during this biweekly period to investigate the root cause of aborted builds due to both logged invalid store paths and logged missing nar cache files. Nushell scripts were written to examine and repair specific closures as well as to walk all nix cache objects and proactively resolve any dangling narinfo files, effectively resolving the aborted builds. Script repair operations were parallelized to speed up the walk rate across the large object count bucket. The root cause was a cache truncation operation which purged a small percentage of objects filtered by oldest age and non-uniformly deleted narinfo and nar objects which needed to remain paired due to self-references. A more intelligent GC approach will be used in the future.

Repository Work -- Merged

Blockperf

  • Fixes a new tracing system blockperf implementation error for trace detail level. blockperf-pr-33

Capkgs

  • Re-adds regular hydraJob builds in addition to fetch-closure only builds to ensure the full jobset can be rebuilt from source. capkgs-commit-range

Cardano-airgap

  • Adds more boot options for better video driver support, including nouveau nomodeset fallback and open and closed Nvidia drivers. The dconf config file was updated to use the nixos modules declaration. Logout, shutdown, restart and similar gnome operations were fixed. Additional helper packages were added. See the PR header for details. cardano-airgap-pr-9

Devx-ci

  • Adds ci10, a x86_64-linux builder, to be repurposed later for Equinix metal migration. Sets narinfo-cache-positive-ttl back to default value, sets the default user nofile limit to 4096 from default of 1024 to avoid occasional nofile failures. Rekeys required group secrets to include the new machine, adds ci7, ci8 to the r2 tunnel. Adds a github-hydra-bridge-restarter service to detect when the bridge token has expired and auto-rotate within one minute of expiration. devx-ci-pr-135

Repository Work In Progress -- PRs and Branches