Skip to main content

SRE Team Update

· 2 min read
John Lotoski
Service Reliability Engineer

High level summary

The SRE team continues work on Cardano environment improvements and general maintenance.

Some notable recent changes, updates or improvements include:

  • Preparation for 10.7.0 pre-release is underway and SRE is working on integrations for kes-agent and dmq-node for the release binaries, node nixos service and OCI containers as appropriate. CI tests for Consensus db-tooling (ie: db-analyser, db-truncater, db-synthesizer) are being added to a nixos test run on Hydra to ensure bundled node version and db-tools version maintain compatibility.

  • Iterative deployments of 10.7.0 pre-release candidates to select pre-release environments are on-going with issues being reported back to developers.

  • Darwin CI build machine updates are underway along with some optimizations and fixes to reduce flaky Darwin platform bugs and noisy alerts as well as a refactor to reduce code complexity. A number of these improvements will appear in the next SRE biweekly update.

  • Loki logging has been added to more of our cardano-parts environments (ie: cardano-playground and cardano-mainnet). Custom Loki dashboards are also being prepared to improve the Loki experience and will appear in the next cardano-parts PR.

Repository Work -- Merged

Cardano-monitoring

cardano-monitoring PR#4:

  • Adds Loki to playground, mainnet and networkteam monitoring servers
  • Raises max_outstanding_per_tenant to accommodate large dashboards w/o errors

cardano-monitoring PR#5:

  • Adjusts Loki log retention to a per-environment setting

Devx-ci

devx-ci PR#144:

  • Increases nofile soft/hard limit to avoid failures on higher nofile requirement builds like virtiofs virtualized images

Repository Work In Progress -- PRs and Branches