Skip to main content

6 posts tagged with "sre"

View All Tags

· One min read
John Lotoski

2023-04 - 2023-06

Main achievements

In addition to ongoing general maintenance and support of cardano environments, main SRE achievements for this quarter include:

  • Expanding the darwin CI cluster and adding aarch64 builder support
  • Adding bare metal capability to bitte clusters
  • Creating a devx-ci cluster containing a Hydra build server and Linux build farm which is intended to replace Cicero functionality
  • Creation of pool performance analysis queries and scripting
  • Migration of testnet metadata server to cardano-world
  • Cardano shelley qa migration to cardano-world
  • Cardano sanchonet environment spin up to test Conway era functionality
  • Mainnet relay conversion to p2p topology usage is progressing with 50% of mainnet relays now using p2p topology and networking feature

Next steps

  • Continue with the conversion of mainnet to using p2p topology

· One min read
John Lotoski

2023-07 - 2023-09

Main achievements

In addition to ongoing general maintenance and support of cardano environments, main SRE achievements for this quarter include:

  • Completion of mainnet relay networking conversion to p2p topology
  • Cardano sanchonet environment respins for testing new cardano-node pre-release Conway era functionality
  • Stabilization of cardano-explorer in cardano-world using high IOPS bare metal machines
  • Creation of a nix content addressed packages repository, capkgs:
    • To provide lightweight release binaries thereby avoiding sluggish nix flakes and devShells
  • Creation of a cardano performance benchmarking cluster, cardano-perf:
    • To replace legacy cluster benchmark tooling
  • Creation of a cardano cluster composition repository, cardano-parts:
    • For enabling multi-cluster, multi-role cardano network deployments
  • Creation of a cardano testnets repository, cardano-playground:
    • Utilizing cardano-parts for testnet deployments
  • Creation of a sanchonet demo repository, sanchonet-demo:
    • Utilizing cardano-parts for fast sanchonet test environment and demo purposes

Next steps

  • Continue with migration of cardano-world testnets to cardano-playground
  • Proceed with spinup of mainnet p2p bootstrap cluster
  • Scale down mainnet non-p2p legacy cluster at the appropriate time

· One min read
John Lotoski

2023-10 - 2023-12

Main achievements

In addition to ongoing general maintenance and support of cardano environments, main SRE achievements for this quarter include:

  • Cardano-parts support was added for cardano-db-sync, cardano-smash, cardano-faucet, cardano-metadata, grafana monitoring along with a number of other features

  • Completed migration of testnets from cardano-world to the cardano-playground cluster

  • Completed migration of the cardano book from cardano-world to the cardano-playground cluster

  • Completed migration of pools from cardano-ops to the cardano-mainnet cluster

  • Creation of a mainnet p2p bootstrap cluster

  • Cardano sanchonet environment respins during the quarter for testing new cardano-node pre-release Conway era functionality

  • All environments were upgraded to cardano-node 8.7.2 or 8.7.3 by the end of the quarter

  • Completion of a govtool backend deployment for Voltaire chain testing

  • Creation of a cardano-monitoring repository, cardano-monitoring:

    • A new repository enabling agile deployment of EC2 monitoring servers, compatible with OpenTofu grafana and mimir providers

Next steps

  • Scale down the mainnet non-p2p legacy cluster

  • Add deployment support for new network services, such as Mithril

  • Continue cardano-parts and operations improvements

· 2 min read
John Lotoski

2024-01 - 2024-03

Main achievements

In addition to ongoing general maintenance and support of cardano environments, main SRE achievements for this quarter include:

  • All cardano release environments, including preview, preprod, mainnet legacy and mainnet new clusters were upgraded through cardano-node releases of 8.7.3, 8.9.0 and finally into 8.9.1 by the end of March

  • All cardano pre-release environments, including sanchonet, private chain, and shelley-qa clusters were upgraded through cardano-node releases of 8.7.3, 8.8.0-pre, 8.8.1-pre, 8.9.0 and finally into 8.9.1 by the end of March

  • Sanchonet and private chain environments were both re-spun once each during this quarter to support new pre-release versions of cardano-node in the Conway era

  • Cardano-parts added a cardano-db-sync process-compose stack for each environment

  • Cardano-parts added a cardano-node process-compose stack for each environment

  • Cardano-parts added enhancements for topology related nixos modules and functions to accomodate new bootstrapPeer functionality, new topology attributes and increased complexity network deployments.

  • Cardano-parts added support for mithril signers integrated with block producers and a mithril-signer-verifier service for monitoring

  • Sanchonet, preview, preprod and mainnet IOG block producers are now signing mithril certificates

  • Cardano-parts added support for mithril clients in nixos cardano-node systemd service, process-compose job stacks and nix cardano-node entrypoint, all of which also require any mithril snapshot to be signed by a trusted IOG pool prior to use

  • Cardano-parts added ip integration tooling, so that similar to other deployer tools like nixops, nixosConfigurations possess ip information which can be used in module configuration

  • Cardano-db-sync snapshots server migration from legacy mainnet cluster to new mainnet cluster with a rewrite of the snapshot service was completed

  • Cardano metadata server migration to Cardano Foundation was completed

  • BlockPerf, a cardano-node performance monitoring tool, was integrated into the new mainnet cluster relays

  • Cardano-node bootstrapPeer functionality was added with node 8.9.x, requiring effort to align nixos service module code between cardano-node nixos services, iohk-nix topology generation, cardano-ops legacy code, and cardano-parts module compatibility as well as feature test under various edge cases

  • Cardano-playground added govtool backend support for private chain voltaire testing team

Next steps

  • Add support for the new cardano-node metrics system

  • Add IPv6 cardano-parts support

  • Extend govtool frontend and backend to a process-compose stack

  • Adapt network spin-up tooling for the new create-testnet-data cardano-cli command

  • Continue cardano-parts and operations improvements

· 2 min read
John Lotoski

2024-04 - 2024-06

Main achievements

In addition to ongoing general maintenance and support of cardano environments, main SRE achievements for this quarter include:

  • All cardano release environments, including preview, preprod, mainnet legacy and mainnet new clusters were upgraded through various cardano-node releases of 8.9.2, 8.9.3, 8.9.4, 8.12.1, and finally into 8.12.2 by the end of June

  • Cardano pre-release environments additionally iterated through pre-release upgrades of 8.11.0-pre, 8.12.0-pre, and finally into 8.12.2 by the end of June with the exception of sanchonet which remains pinned at 8.11.0-pre until the next respin to support node version 9.0.0 or greater

  • Sanchonet environment was re-spun two times for pre-release Conway testing of cardano-node versions 8.10.0-pre and 8.11.0-pre respectively

  • Private chain environment was re-spun three times to support fast epoch Conway testing

  • Cardano-playground and cardano-mainnet repos have added ten operations oriented documents for knowledge transfer

  • Block producers which participate in mithril signing will now produce metrics and can have them scraped with the default metrics agent

  • A cluster spin-up job to utilize the new cardano-cli create-testnet-data sub-command was created

  • A nixosModule, dashboards and alerts were added supporting the new cardano tracing system

  • Many new operations scripts and features were added, including a template diff and patch recipe to pull the latest cardano-parts improvements to consuming repositories more easily

Next steps

  • Finalize support for the new cardano-node tracing system once the service is rewritten for general consumption

  • Add IPv6 cardano-parts support

  • Extend govtool frontend and backend to a process-compose stack once govtool is publicly buildable again

  • Continue cardano-parts and operations improvements

· 3 min read
John Lotoski

2024-07 - 2024-09

Main achievements

In addition to ongoing general maintenance and support of cardano environments, SRE achievements for this quarter include:

  • All IOE cardano-parts supported node environments, including preview, preprod, sanchonet, mainnet and other clusters were upgraded through various cardano-node releases of 9.0.0, 9.1.0, 9.1.1, 9.2.0, and finally into 9.2.1 by the end of September.

  • All IOE cardano-parts supported node environments had dual stack ipv4/ipv6 capability added and configured, including supporting scripts and recipes, module updates, terraform/openTofu resource changes and software updates to make previously ipv6 incompatible software ipv6 compatible, example: cardano-faucet. Cardano-parts clusters can now seamlessly participate in ipv6 cardano-node traffic and other ipv6 traffic.

  • Preview, preprod and mainnet networks were hard forked to Conway.

  • Legacy mainnet cluster shelley-era high-load relays were scaled down over the quarter and stopped now that p2p has removed the need for them.

  • Legacy cardano explorer was retired and Cardano Foundation is now providing the replacement landing page which links to several community explorers.

  • Cardano-smash production load was retired from equinix metal hosting from the cardano-world repo and transferred to the new cardano-mainnet cluster.

  • New cardano-mainnet cluster scaling capability was added for the bootstrap machines. Block performance analysis was used to tune RTS parameters on the bootstraps and other mainnet pool machines.

  • Sanchonet environment was re-spun for cardano-node 9.1.0 and greater compatibility.

  • Private chain was stopped and re-spun with 2 hr epochs for testing.

  • New nixosModules were added to cardano-parts and cardano-playground, including: profile-blockperf, profile-tcpdump (for saving node traffic pcaps to s3) and ogmios.

  • Documentation for playground and mainnet cluster operations was improved, such as documents for: debugging of peer-to-peer connections; governance voting with the playground stakepools; faucet setup; faucet pool de-delegation and mainnet dbsync cardano-snapshot operations. See the docs/explain directory of both the cardano-playground and cardano-mainnet repos for details.

  • The cardano-monitoring repository received a lot of documentation and improvements and now also serves as the home for devx-ci metrics after migration away from Grafana cloud hosting.

  • An improved cardano-airgap image for secure signing operations was created and made available.

  • Hydra CI performance was improved with changes to our custom Nix evaluator and optimized resource usage while waiting for IFDs.

Next steps

  • Add a production protocol-parameters cardano-api based server to facilitate community transaction creation without requiring a live node.

  • Migrate from deprecated grafana agent to grafana alloy.

  • Finalize support for the new cardano-node tracing system once the service is rewritten for general consumption.

  • Extend govtool frontend and backend to a process-compose stack once govtool is publicly buildable again.

  • Continue cardano-parts and operations improvements.