High level summary
The SRE team continues work on Cardano environment improvements and general maintenance.
Some notable recent changes, updates or improvements include:
- Preview network was hard forked to Conway era.
- Cardano-db-sync was updated to
13.4.0.0
across all environments.
Repository Work
Cardano Airgap
Commit-compare
- Update the image to cardano-cli to
9.2.1.0
and credential-manager to HEAD. - Finish testing the airgap image using ext4 partitions and add ventoy to the devShell.
Cardano Parts
Node 9.1.0, Mithril 2430.0, Chang readiness
Overview
Sets cardano-node (release) and cardano-node-ng (pre-release) versions to 9.1.0
and mithril to 2430.0. Includes nixosModule improvements for the new tracing
system, a new template-clone recipe, various recipe improvements and fixes. A
Chang readiness query has been added.
Details
- Important versioning updates:
- cardano-node and cardano-node-ng are now
9.1.0
- Bumps capkgs:
- For node releases of
9.1.0
- For mithril
2430.0
and mithril-unstable
- Improves the
profile-cardano-node-new-tracing
nixosModule and new tracing system in general by better cleaning residual legacy config items, restructuring the options for more flexibility in config composition, and configuring the new tracing system to log close to parity volume with the legacy tracing system when using UTXO-HD in memory mode. - Adds additional default Tcp and TcpExt metrics to the
profile-grafana-agent
nixosModules metrics scrape list - Adds
curl
and the pre-push
script to the default cardano-parts devShell - Adds template alert
cardano_node_elevated_restarts
- Adds a new template recipe of
template-clone
for when downstream users know they simply want to mirror upstream templates rather than diff or patch them - Adds a new template sql script with a Chang era readiness sql query
- Improves template recipe
dbsync-pool-analyze
and sql query with parameters to query any CTE in the large dbsync-pool-perf
sql from cli - Improves template recipe
dbsync-pool-analyze
to handle queries that result in no non-performing pools - Improves template recipe
dedelegate-pools
with a mempool query instead of a fixed time to handle UTxO on-chain settlement - Removes template recipes that are now mostly cardano-playground specific
- Fixes template dashboard for cardano-node legacy and new tracing application metrics to always show the full environment KES period
- Fixes template recipe
apply-bootstrap
- Fixes outdated service option name and db-sync snapshot schema description
Cardano-mainnet
Node 9.1.0, Mithril 2430.0, Bp scheduled restart module
Schedule restart initial prototyping
Overview
Sets cardano-node version to 9.1.0
and mithril to 2430.0
. Adds block
producer scheduled restart capability.
Details:
- Bumps cardano-parts for:
- Important versioning updates:
- cardano-node and cardano-node-ng are now
9.1.0
- Capkgs updates:
- For node releases of
9.1.0
- For mithril
2430.0
and mithril-unstable
- KES rotates mainnet block producers
- Optimizes bootstrap nodes for
-N4
RTS usage - Adds
cardano-node-schedule-restart
nixosModule and associated perSystem packages - Adds new alerts for
cardano_node_elevated_restarts
- Fixes dashboard for
cardano-node
legacy and new tracing application metrics to always show the full environment KES period
Cardano-ops (Legacy Mainnet)
Commit-compare
Over a two week period the legacy relay nodes were scaled down to running only
one instance of cardano-node per machine and then the number of running
machines was further reduced by 25%.
Cardano-playground
Node 9.1.0, Mithril 2430.0, Preview hardfork to Conway
Overview:
Sets cardano-node (release) and cardano-node-ng (pre-release) versions to
9.1.0
and mithril to 2430.0
. Hard forks preview network to Conway. Adds
recipe and other improvements, including to the pool performance query recipe
interface and a Chang readiness query.
Details:
- Bumps cardano-parts for:
- Important versioning updates:
- cardano-node and cardano-node-ng are now
9.1.0
- Capkgs updates:
- For node releases of
9.1.0
- For mithril
2430.0
and mithril-unstable
- Adds a new template sql script with a Chang era readiness sql query
- Adds a babbage-to-conway cost model to the Cardano book
- Adds a new recipe
kes-rotate
for easy kes rotation - Adds new alerts for
cardano_node_elevated_restarts
- Adds a commit stamp marker for Cardano book updates
- Adds a new
template-clone
recipe for mirroring upstream template files when diffing or patching isn't needed - Updates the Cardano book environment for cardano-node
9.1.0
- Updates the explainer docs for kes-rotation, chain-manipulation, new-network
- Updates the preview faucet for new govtool operations
- Rotates sanchonet KES, resizes the metadata server
- Investigates mempool rejections with new tracing system and modified logging
- Tests a comparison set of machines in mainnet environment for node
9.1.0
and utxo-hd-9.0
- Tests a new tracing system branch for metrics renaming and KES metrics update calculations
- Moves some cardano-playground specific recipes to the
scripts/recipes-custom.just
module - Improves template recipe
dbsync-pool-analyze
and sql query with parameters to query any CTE in the large dbsync-pool-perf
sql from cli - Improves template recipe
dbsync-pool-analyze
to handle queries that result in no non-performing pools - Improves template recipe
dedelegate-pools
with a mempool query instead of a fixed time to handle UTxO on-chain settlement - Hard forks preview environment to Conway and resizes one relay member of each preview pool group
Iohk-nix
Add conway config for mainnet/preprod/preview
Devx-ci
Fix Hydra alerting immediately on no data
- Migrate from Grafana Cloud to our self-hosted cardano-monitoring stack
- Do not filter metrics to keep down number of unique series
- This allows unlimited collection of metrics from our CI machines for better alerts and measurements for ongoing performance tuning.
- Upgrade disko partition names manually on remaining machines ci{2,3,4,6,7,8} so boot does not break on the next deployment
- Grafana dashboard: fix memory usage graph
- Add alerts
Cardano-monitoring
Commit-compare
- Add preliminary support for Loki for log collection