Skip to main content

33 posts tagged with "sre"

View All Tags

· One min read
John Lotoski

High level summary

During the lightly staffed holiday period for node SRE, the emphasis was on maintaining environment stability, tuning and resolving any noisey alerts.

Investigation into and testing around the following two topics also started during this period:

  • Ledger snapshots causing a small number of missed slots for forgers on mainnet: ouroboros-consensus-issue-868

  • A cardano-node rare file descriptor leak, with a more detailed description here

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • A new repository was created which enables agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers: cardano-monitoring
  • The govtool backend swagger interface was nix flake packaged and deployed for Voltaire private chain testing usage
  • Grafana cloud monitoring stacks were migrated to new EC2 cardano-monitoring servers
  • Cardano-db-sync state snapshots now support client range requests, details here
  • In addition to monitoring server centralized grafana metrics, sysstat collected system metrics are now available locally on all cluster machines at high time resolution
  • Code changes required due to repository migrations to IntersectMBO have largely been completed

Lower level summary

Auth-keys-hub

Cardano-monitoring

  • A new repository enabling agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers: cardano-monitoring

Cardano-parts

  • Migrate from grafana cloud monitoring to ec2 monitoring, add resource tagging support, declarative route53 CNAME list, and additional improvements and fixes: cardano-parts-pull-25
  • Improve ssh key handling and edge cases, resolve misc issues, add IOPS and throughput gp3 openTofu support: cardano-parts-pull-26

Cardano-playground

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Sanchonet was respun to cardano-node 8.7.0-pre, and upgraded to cardano-node 8.7.1-pre shortly afterwards
  • Cardano-node 8.7.2 was released and all environments were then upgraded to 8.7.2
  • Cardano-parts deployed machines were upgraded to nixpkgs 23.11 and nix 2.19.3
  • Cardano-parts resource deployed environments were switched from the use of Terraform to OpenTofu

Lower level summary

Capkgs

  • Updated for cardano-node 8.7.2 and process-compose packages: capkgs-compare

Cardano-parts

Cardano-ops

Cardano-playground

Cardano-world

  • Sanchonet update PR: cardano-world-pull-111
    • Merge the long running sanchonet-updated branch
    • Migrate explorers from ziti to wireguard tunnel usage
    • Remove remaining ziti code and provisioned resources
    • Retire remaining nomad jobs in preference of the cardano-playground environments
    • Downsize the cluster in preference of the cardano-playground environments

Iohk-nix

· 4 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • The cardano-node nixos service now supports SIGHUP p2p topology reloading when the useSystemdReload option is enabled

Lower level summary

Capkgs

  • Update cardano-db-sync and offchain-metadata-tools package paths and/or references: capkgs-compare

Cardano-node

  • Optionally have cardano-node nixos service utilize SIGHUP p2p topology reload: cardano-node-pull-5537
    • Creates a useSystemdReload bool option for the cardano-node nixos service
    • This will move the topology file(s) to /etc/cardano-node/topology-$i.yaml and inject systemd reload hooks for p2p configured cardano-node instances
    • Moving topology files to /etc also allows for manual topology updates when a quick test is needed and full service re-deployment isn't desired

Cardano-parts

  • Adds a metadata server profile and a number of other features and improvements: cardano-parts-pull-20
    • Adds a new metadata-service profile
    • Adds metadata service and pkg configuration options for cardano-groups to utilize the metadata-server profile
    • Adds a cardano-webserver profile for multiple virtualHosts and TLS ACME server aliases for a cluster's static needs, with each cached behind varnish
    • Adds extra node list producers and public producers for cardano-node-topology profile
    • Adds delegation amounts to cardano-postgres psql prepared query show_pools_block_history_in_epoch
    • Adds select systemd metrics reporting to grafana-agent profile
    • Adds a bookRelay multivalue DNS option to disambiguate with groupRelay multivalue DNS
    • Adds an opsLib library to the cardano-parts lib flakeModule and refactors some common code into it
    • Adds support for sops secret traversing from target path up instead of cwd up, thereby supporting secrets use-cases outside of the repo
    • Adds job-gen-env-config for both release and pre-release configuration files to support configuration book generation
    • Adds support for grafana recording rules in the template files
    • Improves cardano-group profile handling of producers with respect to multiple instance nodes
    • Improves grafana-agent profile metrics handling for multi-instance cardano-node servers
    • Improves smash service preStart handling while waiting for a node socket
    • Updates Justfile for ERA_CMD demo support
    • Migrates default grafana cloud node exporter, varnish alert and recording rules to grafana alert and recording rule templates
    • Defaults to using an updated systemd reload nixos service feature for p2p networks in cardano-group profile
    • Defaults cardano-postgres profile psqlrc use to false

Cardano-playground

  • Adds a new testnet metadata server, cluster webserver, and other improvements: cardano-playground-pull-6
    • Adds a new metadata server
    • Adds a new webserver for the cluster's static virtualhost needs
    • Adds support for sops secret traversing from target path up instead of cwd up, thereby supporting secrets use-cases outside of the repo
    • Adds systemd metrics monitoring to the cluster
    • Resizes sanchonet machines to support the growing chain
    • Completes migration of preprod from world
    • Updates groups to utilize both bookRelay and groupRelay multivalue DNS attributes
    • Updates Justfile for ERA_CMD demo support
    • Defaults to using an updated systemd reload nixos service feature for p2p networks in cardano-group profile
    • Migrates book static code to playground from world, with refactor, cleanup and updates
    • Migrates default grafana cloud node exporter, varnish alert and recording rules to declarative grafana alert and recording rules

Offchain-metadata-tools

  • Adds db password option with obfuscation plus misc improvements: offchain-metadata-tools-pull-61
    • Adds db password connection option and obfuscates passwords in output for metadata server, sync, webhook services
    • Updates the nixos service for metadata-webhook service to optionally use an environmentFile for secrets: cfg.environmentFile
    • Moves from std use in the nix flake to standard flake schema
    • Fixes hydra CI failures
    • Builds update-docs in hydra to avoid long local build times
    • Removes deprecated tullia
    • Removes deprecated check-hydra from pkgs
    • Removes deprecated bors files and references

· 3 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-world testnets of preprod, preview, sanchonet and some private test chains have largely completed their migration to the cardano-playground stack

Lower level summary

Capkgs

Cardano-parts

  • General migration support PR for cardano-world to cardano-playground cardano-parts-pull-18
    • Iohk-nix and iohk-nix-ng were updated support the migration of cardano-world networks to cardano-playground
    • Sops-secrets dependent systemd services were fixed to ensure restart upon sops secrets changes
    • Db chain utilities (db-{analyser,synthesizer,truncater}) had -ng variants created to operate on both release and pre-release network chains
    • The profile-cardano-postgres nixos module received preset variables and prepared statements via pgsqlrc for faster and easier analysis of network state
    • The flakeModule jobs now has support for the cardano-cli era command in each of the job scripts by passing the $ERA_CMD variable
    • Default cardano-node-ng package is now 8.6.0-pre, dbsync on sanchonet is now sancho-2-2-0
    • For scripts using a nix-shell shebang, the cardano-parts devShell menu can be disabled from injecting itself into stdout by passing NOMENU=true
    • Template updates include:
      • Adds optional TF AZ declaration on ec2 resources
      • Adds a cardano node p2p dashboard to the grafana cloud stack
      • Adds a dbsync pool performance analysis query
      • Updates python distribute and delegation scripts from world for playground compatibility
      • Starts a python script lib to reduce shared code among the python scripts
      • Several justfile improvements and new recipes
    • More detail is available in the PR description
  • Update submit action script for 8.6 cardano-parts-pull-19
  • Update scripts for 8.6.0-pre cardano-parts-pull-21
    • Fixes subcommand names based on ERA_CMD
    • Adds deposits to some commands
    • Separates CC cold/hot key generation as host authorization has to occur after action is approved
    • CC voting enabled in vote job

Cardano-playground

  • Migration PR to largely complete the network migration from cardano-world to cardano-playground: cardano-playground-pull-5
    • Adds re-spun private chain network
    • Migrates shelley-qa chain network from world
    • Justfile improvements and new recipes
    • Improve concurrent environment chain support
    • More detail is available in the PR description

Iohk-nix

  • Migration to play: iohk-nix-pull-561
    • Migrate cardano-lib networks from world.dev.cardano.org to play.dev.cardano.org
    • Remove deprecated cardano-lib p2p network environment
    • Update sanchonet chain with respin changes
    • Update private chain with respin changes
    • Bump private and shelley-qa chains to sanchonet equivalent conway genesis
    • Bump preview, preprod chains to sanchonet equivalent conway genesis for node 8.6.0-pre pre-release testing

Sanchonet-demo