Skip to main content

SRE Team Update

· 3 min read
John Lotoski

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • The preprod network was hard forked to Conway era.

  • The nixosModule profile-blockperf in cardano-parts now includes prometheus metrics, automatically scraped with grafana-agent along with a dashboard.

  • A nixosModule profile-tcpdump in cardano-parts is now available to push on-going pcaps to s3 for historical reference.

  • Old dev environments were cleaned up and retired after the completion of the ouroboros-network-ops cluster migration to the cardano-parts stack.

  • Causes of blockperf indicated mainnet relay delayed block headers were investigated and improved with adjustments to RTS parameters and machine class.

  • Conway-era mempool log volume increase was investigated and resolved with ouroboros-network improvements.

  • Scaling capability was added to the cardano-mainnet bootstrap cluster.

Repository Work

Cardano Parts

  • Sets cardano-db-sync (release) to 13.4.0.0. Includes nixosModule improvements to cardano-db-sync snapshots module with a manual trigger, blockperf module new prom metrics, grafana-agent module with auto-blockperf scrape config and a new tcpdump module for persistent pcaps to s3. Recipe improvements for configuration consistency checking and openTofu improved AMI and DNS filtering have been made. The AWS machine reference spec has been updated and one alert tuned for better sensitivity. More detail is available in the PR description: cardano-parts-pull-46

Cardano-mainnet

  • Deploys cardano-db-sync (release) to 13.4.0.0. Deploys nixosModule improvements for cardano-db-sync snapshots module with a manual trigger, blockperf module with new prom metrics, grafana-agent module with auto-blockperf scrape config and a new tcpdump module for persistent pcaps to s3. Recipes improvements for configuration consistency checking and openTofu improved AMI and DNS filtering have been made. Makes changes to pool group relays to eliminate or reduce delayed block headers. Tests additional dev patches for missingBlock errors. Adds bootstrap cluster scaling capability and a bootstrap cluster dashboard. Improvements made in cardano-parts PR#46 are included in this PR. More detail is available in the PR description: cardano-mainnet-pull-20

Cardano-ops (Legacy Mainnet)

  • Over a two week period the legacy relay nodes were scaled down 50% further from the recent machine quantity peak. commit-compare

Cardano-playground

  • Preprod was hard-forked to Conway. Deploys cardano-db-sync to 13.4.0.0. Recipe improvements for configuration consistency checking and openTofu improved AMI and DNS filtering have been made. Improvements made in cardano-parts PR#46 are included in this PR. More detail is available in the PR description: cardano-playground-pull-30

Cardano-world

  • Updates openssh to 9.8p1 on remaining cardano-world (soon-to-be-retired) cluster machines commit