Skip to main content

75 posts tagged with "sre"

View All Tags

SRE Team Update

· 4 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • The cardano-node nixos service now supports SIGHUP p2p topology reloading when the useSystemdReload option is enabled

Lower level summary

Capkgs

  • Update cardano-db-sync and offchain-metadata-tools package paths and/or references: capkgs-compare

Cardano-node

  • Optionally have cardano-node nixos service utilize SIGHUP p2p topology reload: cardano-node-pull-5537
    • Creates a useSystemdReload bool option for the cardano-node nixos service
    • This will move the topology file(s) to /etc/cardano-node/topology-$i.yaml and inject systemd reload hooks for p2p configured cardano-node instances
    • Moving topology files to /etc also allows for manual topology updates when a quick test is needed and full service re-deployment isn't desired

Cardano-parts

  • Adds a metadata server profile and a number of other features and improvements: cardano-parts-pull-20
    • Adds a new metadata-service profile
    • Adds metadata service and pkg configuration options for cardano-groups to utilize the metadata-server profile
    • Adds a cardano-webserver profile for multiple virtualHosts and TLS ACME server aliases for a cluster's static needs, with each cached behind varnish
    • Adds extra node list producers and public producers for cardano-node-topology profile
    • Adds delegation amounts to cardano-postgres psql prepared query show_pools_block_history_in_epoch
    • Adds select systemd metrics reporting to grafana-agent profile
    • Adds a bookRelay multivalue DNS option to disambiguate with groupRelay multivalue DNS
    • Adds an opsLib library to the cardano-parts lib flakeModule and refactors some common code into it
    • Adds support for sops secret traversing from target path up instead of cwd up, thereby supporting secrets use-cases outside of the repo
    • Adds job-gen-env-config for both release and pre-release configuration files to support configuration book generation
    • Adds support for grafana recording rules in the template files
    • Improves cardano-group profile handling of producers with respect to multiple instance nodes
    • Improves grafana-agent profile metrics handling for multi-instance cardano-node servers
    • Improves smash service preStart handling while waiting for a node socket
    • Updates Justfile for ERA_CMD demo support
    • Migrates default grafana cloud node exporter, varnish alert and recording rules to grafana alert and recording rule templates
    • Defaults to using an updated systemd reload nixos service feature for p2p networks in cardano-group profile
    • Defaults cardano-postgres profile psqlrc use to false

Cardano-playground

  • Adds a new testnet metadata server, cluster webserver, and other improvements: cardano-playground-pull-6
    • Adds a new metadata server
    • Adds a new webserver for the cluster's static virtualhost needs
    • Adds support for sops secret traversing from target path up instead of cwd up, thereby supporting secrets use-cases outside of the repo
    • Adds systemd metrics monitoring to the cluster
    • Resizes sanchonet machines to support the growing chain
    • Completes migration of preprod from world
    • Updates groups to utilize both bookRelay and groupRelay multivalue DNS attributes
    • Updates Justfile for ERA_CMD demo support
    • Defaults to using an updated systemd reload nixos service feature for p2p networks in cardano-group profile
    • Migrates book static code to playground from world, with refactor, cleanup and updates
    • Migrates default grafana cloud node exporter, varnish alert and recording rules to declarative grafana alert and recording rules

Offchain-metadata-tools

  • Adds db password option with obfuscation plus misc improvements: offchain-metadata-tools-pull-61
    • Adds db password connection option and obfuscates passwords in output for metadata server, sync, webhook services
    • Updates the nixos service for metadata-webhook service to optionally use an environmentFile for secrets: cfg.environmentFile
    • Moves from std use in the nix flake to standard flake schema
    • Fixes hydra CI failures
    • Builds update-docs in hydra to avoid long local build times
    • Removes deprecated tullia
    • Removes deprecated check-hydra from pkgs
    • Removes deprecated bors files and references

SRE Team Update

· 3 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-world testnets of preprod, preview, sanchonet and some private test chains have largely completed their migration to the cardano-playground stack

Lower level summary

Capkgs

Cardano-parts

  • General migration support PR for cardano-world to cardano-playground cardano-parts-pull-18
    • Iohk-nix and iohk-nix-ng were updated support the migration of cardano-world networks to cardano-playground
    • Sops-secrets dependent systemd services were fixed to ensure restart upon sops secrets changes
    • Db chain utilities (db-{analyser,synthesizer,truncater}) had -ng variants created to operate on both release and pre-release network chains
    • The profile-cardano-postgres nixos module received preset variables and prepared statements via pgsqlrc for faster and easier analysis of network state
    • The flakeModule jobs now has support for the cardano-cli era command in each of the job scripts by passing the $ERA_CMD variable
    • Default cardano-node-ng package is now 8.6.0-pre, dbsync on sanchonet is now sancho-2-2-0
    • For scripts using a nix-shell shebang, the cardano-parts devShell menu can be disabled from injecting itself into stdout by passing NOMENU=true
    • Template updates include:
      • Adds optional TF AZ declaration on ec2 resources
      • Adds a cardano node p2p dashboard to the grafana cloud stack
      • Adds a dbsync pool performance analysis query
      • Updates python distribute and delegation scripts from world for playground compatibility
      • Starts a python script lib to reduce shared code among the python scripts
      • Several justfile improvements and new recipes
    • More detail is available in the PR description
  • Update submit action script for 8.6 cardano-parts-pull-19
  • Update scripts for 8.6.0-pre cardano-parts-pull-21
    • Fixes subcommand names based on ERA_CMD
    • Adds deposits to some commands
    • Separates CC cold/hot key generation as host authorization has to occur after action is approved
    • CC voting enabled in vote job

Cardano-playground

  • Migration PR to largely complete the network migration from cardano-world to cardano-playground: cardano-playground-pull-5
    • Adds re-spun private chain network
    • Migrates shelley-qa chain network from world
    • Justfile improvements and new recipes
    • Improve concurrent environment chain support
    • More detail is available in the PR description

Iohk-nix

  • Migration to play: iohk-nix-pull-561
    • Migrate cardano-lib networks from world.dev.cardano.org to play.dev.cardano.org
    • Remove deprecated cardano-lib p2p network environment
    • Update sanchonet chain with respin changes
    • Update private chain with respin changes
    • Bump private and shelley-qa chains to sanchonet equivalent conway genesis
    • Bump preview, preprod chains to sanchonet equivalent conway genesis for node 8.6.0-pre pre-release testing

Sanchonet-demo

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Cardano-world testnets of preprod, preview, sanchonet and some private test chains are in the process of being migrated to the cardano-playground stack

Lower level summary

Capkgs

  • Adds offchain-metadata-tools, dbsync sanchonet updates: capkgs-compare

Cardano-parts

  • General package updates, modules improvements and template recipes to support network migration from world to playground cardano-parts-pull-17
    • Bumps cardano-db-sync-ng to sancho-2-0-0 tag
    • Bumps iohk-nix-ng to mig-sancho branch for sanchonet pool migration from world to play
    • Adds more machine system bins and devShell bins for scripting and debug purposes
    • Adds cardano-show-kes-period alias on any node machine importing profile-cardano-node-group module
    • Adds profile-cardano-node-topology module for a simplified interface to most common topology needs
    • Adds a job-delegate-rewards-stake-key job as an optional follow on to pool creation and registration jobs
    • Adds a topology function to filter self from group machines with an allowList for matching infixes
    • Adds metadata-server and related offchain-metadata-tools bins from capkgs
    • Updates justfile template with:
      • a new query-all recipe for getting status of multiple concurrent running environments
      • a new set-default-cardano-env recipe for fast switching between environments
      • a new start-demo recipe for forking a custom env into conway
      • a new start-node recipe for generic environment start
      • a new stop-node recipe for generic environment stop
      • updated list-machines recipe for handling of empty nixos machine config and empty ssh_config conditions
      • updated query-tip recipe to a generic query tip compatible with each environment

Cardano-playground

SRE Team Update

· 3 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Sanchonet environment was updated to 8.5.0-pre.
  • Cardano-parts now supports cardano-db-sync, cardano-smash, cardano-faucet and grafana cloud monitoring

Lower level summary

Capkgs

  • Various improvements and fixes including:
    • Add GHA cron schedule
    • Add nix auto-gc to avoid running out of storage during large package set builds
    • Add new packages to capkgs
    • Reduce runner storage requirement leaving more room for builds
    • Restructure capkgs attribute names to avoid package name collisions
    • Return to non-musl builds for cardano packages to retain journald compatability
    • Update cache usage to from file level to folder level to reduce network and latency overhead
    • Commit diff: capkgs-compare

Cardano-parts

  • Updates cardano-node-ng to 8.5.0-pre and adds a Conway era automation job: cardano-parts-pull-16
  • Dbsync, smash, faucet and more: cardano-parts-pull-15
    • Adds cardano-db-sync, cardano-faucet, cardano-postgres, cardano-smash, profiles and/or services and related changes
    • Adds nginx vhost metrics exporter profile
    • Adds smash registered-relay-dump service and exporter for use until legacy relay nodes are scaled down
    • Adds bash *-ng autocompletion compatible wrappers
    • Adds a list-machines just recipe using nushell dataframe outer joins and scj ssh_config parser for fast cluster evals of machine state overview
    • Adds downstream grafana cloud dashboard as templates
    • Adds downstream grafana cloud alerts as templates
    • Updates grafana-agent profile with new exporter scrape hooks: cardano-db-sync, cardano-faucet, nginx-vts, varnish
    • Updates the basic profile with IOG cache and commonly used bins
    • Updates the pre-release profile to support cardano-db-sync, cardano-faucet, cardano-smash *-ng versioning
    • Updates flakeModule jobs with new conway era automation and additional IO encryption shimming and file type checks
    • Updates .sops.yaml template for supporting faucet secrets, workbench secrets, state-demo secrets
    • Updates the Justfile template with terraform fixes for workspace switching and provider auto-reconfiguration
    • Updates the cloudFormation terraformState template with stack modifications to preserve all resources in case of deletion
    • Updates the colmena template with dbsync, smash, faucet machines profiles and roles
    • Improves prior cardano-postgres modules to now automatically tune pg parameters based on machine cpuCount, memMiB and desired conns
    • Bumps capkgs node-ng to 8.5.0-pre

Cardano-playground

  • Dbsync, smash, faucet and more: cardano-playground-pull-3
    • Adds a list-machines just recipe using nushell dataframe outer joins and scj ssh_config parser for fast cluster evals of machine state overview
    • Adds dbsync, smash, faucet machines and corresponding metrics exporters, dashboards and alerts
    • Moves the flake.cardano-parts.cluster.group attrSet name to groups to accurately reflect the plurality and the upstream corresponding change
    • Optimizes machine sizes
    • Updates .sops.yaml for supporting faucet secrets, workbench secrets, state-demo secrets
    • Updates the cloudFormation terraformState file with stack modifications to preserve all resources in case of deletion
    • Updates the cluster isNg definition to support cardano-db-sync, cardano-faucet, cardano-smash *-ng versioning
    • Updates the Justfile with terraform fixes for workspace switching and provider auto-reconfiguration

Cardano-world

Sanchonet-demo

  • Update for cardano-node 8.5.0, conway job recipes and cardano-parts interface changes: sanchonet-demo-commit

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on cardano environment improvements and general environment maintenance.

Some notable recent changes, updates or improvements include:

  • Sanchonet environment was re-spun starting from slot 7171200 and updated to cardano-node 8.4.0-pre.
  • The use of cardano-node docker hub will be deprecated in preference of GHCR

Lower level summary

Capkgs

  • Refactor parsing scripts, add github action automation, various bugfixes and cleanup: capkgs-compare

Cardano-parts

  • Updates secrets layout scheme, adds sops enc/dec for jobs, adds cloud monitoring profile, updates flake templates and other improvements/fixes: cardano-parts-pull-8

Cardano-playground

  • Updates for new cardano-parts secrets handling and layout, TF workspace handling, group multivalue DNS support, grafana cloud monitoring and other improvements: cardano-playground

Cardano-world