Skip to main content

72 posts tagged with "sre"

View All Tags

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • The IOG private mainnet pools were retired this week. The IOG1 public stakepool remains active and forging.

  • An internal Dijkstra network was spun up for testing of the upcoming intra-era hard fork, followed by Dijkstra hark fork testing.

Repository Work -- Merged

Cardano-airgap

cardano-airgap PR#11:

  • Updates to nixpkgs 25.11
  • Updates adawallet with a nixpkgs also at 25.11 and fixed docopts
  • Bumps capkgs and corresponding bech32 package

Cardano-node

cardano-node PR#6401:

  • Bumps iohkNix flake input and adjusts configuration files for new tracing system parameter changes.

Cardano-parts

cardano-parts PR#78:

  • Adds CI tests for process-compose validation of node and db-sync stacks on the public networks.

Devx-ci

devx-ci PR#140:

  • Provides improvements to hydra-tools, including support for multiple GitHub organizations and GitHub app installations.

Repository Work In Progress -- PRs and Branches

SRE Team Update

· One min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Much of the SRE team is on vacation during this biweekly update.

Happy holidays to all of the Cardano community!

Repository Work -- Merged

Capkgs

capkgs Range:

  • Updates the content address package repository CI job to use a netrc token for handling GitHub API rate limits. URL redirection handling is also added.

Devx-ci

devx-ci PR#139:

  • Add extra x86_64-linux build farm machines ci11, ci12 to the build cluster and re-key secrets

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Starting with the next node release version 10.6.2, release binaries and OCI images will be generated for arm64 architectures.

Repository Work -- Merged

Acropolis

acropolis PR#482:

  • Removes unused packages to free up disk space for running CI tests

acropolis PR#483:

  • Schedules a run of the omnibus bootstrap process every morning at 00:15
  • Fail the job if the process does not complete within 3 hours

Cardano-node

cardano-node PR#6376:

  • This PR improves support for multiple arches in the following ways:

    1. Adds aarch64-linux nix packages, including musl static and OCI tarball generation package variants;

    2. Bumps GHC from 9.6.6 -> 9.6.7 as well as the cardano-automation flake input for aarch64-linux support;

    3. Updates the release-ghcr GHA workflow to produce linux multi-arch manifest OCI and corresponding release images which auto-resolve on container pull to the appropriate arch (amd64 or arm64);

    4. Updates the release-upload GHA workflow to produce new linux and darwin aarch64 artifacts. Produces new OCI/goarch standard name aligned default OCI images.

    More details available in the PR description.

cardano-node PR#6391:

  • Adds db-analyser, db-synthesizer, and db-truncater to the Cardano Node container image.

Devx-ci

devx-ci PR#136:

  • Updates hydra version from 2.28 -> 2.32 + issue patch and explicitly allows IFD
  • Applies nix version 2.32-maintenance to hydra and linux builders
  • Adds ssh stabilization params to the hydra module for connection to remote builders
  • Disables nixos optimise on hydra to avoid GC performance degradation
  • Removes the r2 wireguard tunnel from the remote builders as it is not currently required

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 3 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Within approximately 1 day of the preview network partition occurring, mentioned in the last biweekly SRE update, mainnet experienced a similar partition. Again, after an intense multi-team and community collaborative effort which the SRE team was participating in from the beginning, a new cardano-node version 10.5.3 was released to fix the issue, with all network participants on problematic versions encouraged to upgrade immediately so that the partition would resolve itself as the majority of stake aligned under node versions with proper ledger hash handling. Indeed, with the quick reaction of the community as a whole, the network partition resolved within approximately 14 hours. Various after action reports are available detailing the event, such as Pi Lanningham's Poison Piggy - After Action Report

  • Subsequent to the mainnet fork event above, 10.6.1 has also been pre-released and deployed to IOE pre-release infrastructure. The SRE team has also been participating in other after action investigation activity.

Repository Work -- Merged

Cardano-mainnet

  • Bumps and deploys cardano-node release to 10.5.3

  • Fixed Mithril scripts due to Mithril aggregator upstream API breaking changes

  • Updates the save-ssh-config recipe to match opentofu ssh config artifact output

  • Cleaned up flake.nix inputs and some colmena modules code

  • Modifies two bootstrap EBS gp3 volumes to accommodate some data handling work

    cardano-mainnet-pr-40

Cardano-parts

  • Bumps cardano-node release to 10.5.3, pre-release to 10.6.1, and cardano-db-sync pre-release to 13.6.0.6

  • Mithril script code for node entrypoint, systemd ExecPre and process-compose jobs were fixed to accommodate breaking upstream API changes

  • The pkgs.nix flakeModule was re-factored to make switching between localFlake and capkgs pins easier

    cardano-parts-release-v2025-12-04

Cardano-playground

  • Bumps and deploys cardano-node release to 10.5.3, pre-release to 10.6.1, and cardano-db-sync pre-release to 13.6.0.6

  • Updates the cardano-book for the corresponding cardano-node release and pre-release updates

  • Adds a metrics scraper nixosModule for perf team custom data analysis

  • Fixed Mithril scripts due to Mithril aggregator upstream API breaking changes (via cardano-parts pin update)

  • Updates the save-ssh-config recipe to match opentofu ssh config artifact output

  • Cleaned up flake.nix inputs and some colmena modules code

    cardano-playground-pr-52

Repository Work In Progress -- PRs and Branches

SRE Team Update

· 4 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-node 10.6.0 has been pre-released with the corresponding long running SRE PRs now merged into this release! See the release notes for details.

  • SRE team identified a ledger replay bug in the 10.6.0 release candidate whereby the legacy tracing system would no longer log ledger replay update statistics. A fix was implemented prior to tagging and pre-releasing.

  • Near the end of this biweekly reporting period, the preview network experienced a network partition. After an intense multi-team and community collaborative effort which the SRE team was participating in from the beginning, the bug causing the partition was identified and a new cardano-node version 10.5.2 was released to fix the issue. This version was deployed to the IOE preview machines immediately upon release and shortly afterwards to the rest of the IOE testnet and mainnet infra.

Repository Work -- Merged

Cardano-mainnet

  • Bump cardano-parts for v2025-11-18

  • Updated CloudFormation terraformState.nix and opentofu/cluster.nix for corresponding tagging updates

  • Added new required flake cluster attribute declaration for new required resource tagging

  • Added a matomo nix module prototype in prep for a legacy bitte cluster matomo migration to prod

  • Fixed script breakages caused by cardano-cli breaking changes

  • Adds a smash delisting

  • Rotates mainnet KES

  • Adjusts an alert for pool 1 infrequent forging threshold noise

    cardano-mainnet-pr-39

Cardano-parts

  • Bumps cardano-node pre-release to 10.6.0, mithril to 2543.1-hotfix and blockperf to a fix branch which includes a patch for the new tracing system proper blockperf configuration

  • Added rsync ssm help bash function and alias to the common machine profile

  • Added peer snapshot files to the ops library function generateStaticHTMLConfigs

  • Added new flakeModule cluster.nix options of infra.generic costCenter, owner and project

  • Added zsh devShell command completion

  • Updated a number of nixosModules to support both new and legacy tracing systems as well as 10.6.0 and 10.5.1 configuration differences

  • Updated template CloudFormation terraform state and opentofu cluster resource definitions for corresponding tagging updates

  • Fixed template script breakages caused by cardano-cli breaking changes included in the 10.6.0 pre-release

  • Fixed the profile-blockperf.nix nixosModule new tracing system configuration

    cardano-parts-release-v2025-11-18

Cardano-playground

  • Added book config updates for 10.6.0 pre-release environments: preprod, preview

  • Added sanchonet environment configs for community disaster test participation

  • Added a "New Pool" document explainer at docs/explain/new-pool.md

  • Added new required flake cluster attribute declaration for new required resource tagging

  • Added a matomo nix module prototype in prep for a legacy bitte cluster matomo migration to prod

  • Added wireguard tunnel endpoints as temporary R2 colo http streaming/timeout bucket workarounds

  • Added misc improvements to playground scripts for governance voting

  • Updated CloudFormation terraform state and opentofu cluster resource declarations for corresponding tagging updates

  • Updated CI to a smaller representative machine subset

  • Updated preview, preprod and non-prod test forgers with KES rotation

  • Fixed script breakages caused by cardano-cli breaking changes

  • Voted on a preview and preprod governance action with drep/pools and CC

    cardano-playground-pr-51

Iohk-nix

  • Merges non-forger and forger configs, with node handling differences internally based on forger status (ie: PeerSharing, TargetNumberOfKnownPeers, TargetNumberOfRootPeers)

  • Includes peerSnapshotFile for all networks, now at v2

  • Allows SRV records for bootstrap resource definitions

  • Adjust default networking mode to p2p without explicit declaration as the only mode for >= 10.6.0 is p2p

  • Bump minNodeVersion to 10.6.0 for default config changes

  • Testnet templates have been adjusted for plutus v3 cost model params at a mainnet matching 251 parameters and Dijkstra genesis added

    iohk-nix-pr-602

Repository Work In Progress -- PRs and Branches