Skip to main content

39 posts tagged with "sre"

View All Tags

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-node 8.7.3 is now generally deployed to all testnet and mainnet environments managed by coretech SRE

  • Dbsync and node stack process-compose jobs are now available from cardano-parts for users running nix >= 2.17.0 and nix experimental-features = nix-command flakes fetch-closure

  • These can be run with:

     nix run github:input-output-hk/cardano-parts#run-process-compose-node-stack
    nix run github:input-output-hk/cardano-parts#run-process-compose-dbsync-$NETWORK
  • For more details, see the merged cardano-parts process-compose PR

Lower level summary

Cardano-monitoring

Cardano-mainnet

  • Adds a readme, switches to nonmoving gc for producers, plus misc improvements from cardano-parts: cardano-mainnet-pull-6

Cardano-ops

  • Merged a long standing branch converting legacy mainnet relays to p2p, node -> 8.7.2, db-sync snapshots -> 13.1.1.3, and other improvements: cardano-ops-pull-417

Cardano-parts

  • Adds a readme, provides misc improvements, service optimizations, alert tuning, sql pool performance analysis fix, package updates: cardano-parts-pull-27
  • Adds process-compose dbsync and node stacks: cardano-parts-pull-28

Cardano-playground

Upstream Contributions

  • Contributions to upstream process-compose related repos were made in order to complete the process-compose dbsync and node stacks in cardano-parts, including the following:

Process-compose-flake

Services-flake

· One min read
John Lotoski

High level summary

During the lightly staffed holiday period for node SRE, the emphasis was on maintaining environment stability, tuning and resolving any noisey alerts.

Investigation into and testing around the following two topics also started during this period:

  • Ledger snapshots causing a small number of missed slots for forgers on mainnet: ouroboros-consensus-issue-868

  • A cardano-node rare file descriptor leak, with a more detailed description here

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • A new repository was created which enables agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers: cardano-monitoring
  • The govtool backend swagger interface was nix flake packaged and deployed for Voltaire private chain testing usage
  • Grafana cloud monitoring stacks were migrated to new EC2 cardano-monitoring servers
  • Cardano-db-sync state snapshots now support client range requests, details here
  • In addition to monitoring server centralized grafana metrics, sysstat collected system metrics are now available locally on all cluster machines at high time resolution
  • Code changes required due to repository migrations to IntersectMBO have largely been completed

Lower level summary

Auth-keys-hub

Cardano-monitoring

  • A new repository enabling agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers: cardano-monitoring

Cardano-parts

  • Migrate from grafana cloud monitoring to ec2 monitoring, add resource tagging support, declarative route53 CNAME list, and additional improvements and fixes: cardano-parts-pull-25
  • Improve ssh key handling and edge cases, resolve misc issues, add IOPS and throughput gp3 openTofu support: cardano-parts-pull-26

Cardano-playground

· 2 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Sanchonet was respun to cardano-node 8.7.0-pre, and upgraded to cardano-node 8.7.1-pre shortly afterwards
  • Cardano-node 8.7.2 was released and all environments were then upgraded to 8.7.2
  • Cardano-parts deployed machines were upgraded to nixpkgs 23.11 and nix 2.19.3
  • Cardano-parts resource deployed environments were switched from the use of Terraform to OpenTofu

Lower level summary

Capkgs

  • Updated for cardano-node 8.7.2 and process-compose packages: capkgs-compare

Cardano-parts

Cardano-ops

Cardano-playground

Cardano-world

  • Sanchonet update PR: cardano-world-pull-111
    • Merge the long running sanchonet-updated branch
    • Migrate explorers from ziti to wireguard tunnel usage
    • Remove remaining ziti code and provisioned resources
    • Retire remaining nomad jobs in preference of the cardano-playground environments
    • Downsize the cluster in preference of the cardano-playground environments

Iohk-nix

· 4 min read
John Lotoski

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • The cardano-node nixos service now supports SIGHUP p2p topology reloading when the useSystemdReload option is enabled

Lower level summary

Capkgs

  • Update cardano-db-sync and offchain-metadata-tools package paths and/or references: capkgs-compare

Cardano-node

  • Optionally have cardano-node nixos service utilize SIGHUP p2p topology reload: cardano-node-pull-5537
    • Creates a useSystemdReload bool option for the cardano-node nixos service
    • This will move the topology file(s) to /etc/cardano-node/topology-$i.yaml and inject systemd reload hooks for p2p configured cardano-node instances
    • Moving topology files to /etc also allows for manual topology updates when a quick test is needed and full service re-deployment isn't desired

Cardano-parts

  • Adds a metadata server profile and a number of other features and improvements: cardano-parts-pull-20
    • Adds a new metadata-service profile
    • Adds metadata service and pkg configuration options for cardano-groups to utilize the metadata-server profile
    • Adds a cardano-webserver profile for multiple virtualHosts and TLS ACME server aliases for a cluster's static needs, with each cached behind varnish
    • Adds extra node list producers and public producers for cardano-node-topology profile
    • Adds delegation amounts to cardano-postgres psql prepared query show_pools_block_history_in_epoch
    • Adds select systemd metrics reporting to grafana-agent profile
    • Adds a bookRelay multivalue DNS option to disambiguate with groupRelay multivalue DNS
    • Adds an opsLib library to the cardano-parts lib flakeModule and refactors some common code into it
    • Adds support for sops secret traversing from target path up instead of cwd up, thereby supporting secrets use-cases outside of the repo
    • Adds job-gen-env-config for both release and pre-release configuration files to support configuration book generation
    • Adds support for grafana recording rules in the template files
    • Improves cardano-group profile handling of producers with respect to multiple instance nodes
    • Improves grafana-agent profile metrics handling for multi-instance cardano-node servers
    • Improves smash service preStart handling while waiting for a node socket
    • Updates Justfile for ERA_CMD demo support
    • Migrates default grafana cloud node exporter, varnish alert and recording rules to grafana alert and recording rule templates
    • Defaults to using an updated systemd reload nixos service feature for p2p networks in cardano-group profile
    • Defaults cardano-postgres profile psqlrc use to false

Cardano-playground

  • Adds a new testnet metadata server, cluster webserver, and other improvements: cardano-playground-pull-6
    • Adds a new metadata server
    • Adds a new webserver for the cluster's static virtualhost needs
    • Adds support for sops secret traversing from target path up instead of cwd up, thereby supporting secrets use-cases outside of the repo
    • Adds systemd metrics monitoring to the cluster
    • Resizes sanchonet machines to support the growing chain
    • Completes migration of preprod from world
    • Updates groups to utilize both bookRelay and groupRelay multivalue DNS attributes
    • Updates Justfile for ERA_CMD demo support
    • Defaults to using an updated systemd reload nixos service feature for p2p networks in cardano-group profile
    • Migrates book static code to playground from world, with refactor, cleanup and updates
    • Migrates default grafana cloud node exporter, varnish alert and recording rules to declarative grafana alert and recording rules

Offchain-metadata-tools

  • Adds db password option with obfuscation plus misc improvements: offchain-metadata-tools-pull-61
    • Adds db password connection option and obfuscates passwords in output for metadata server, sync, webhook services
    • Updates the nixos service for metadata-webhook service to optionally use an environmentFile for secrets: cfg.environmentFile
    • Moves from std use in the nix flake to standard flake schema
    • Fixes hydra CI failures
    • Builds update-docs in hydra to avoid long local build times
    • Removes deprecated tullia
    • Removes deprecated check-hydra from pkgs
    • Removes deprecated bors files and references