Skip to main content

45 posts tagged with "performance-tracing"

View All Tags

· 3 min read
Michael Karg
  • Benchmarking: We worked on adjusting our infrastructure to the new 8.0 release branch and performed a (very) early run.
  • New tracing: We're profiling the new tracing system for minimizing its resource footprint and guarantee high throughput.
  • Analysis pipeline: Variance analysis both for reporting and for serving as a point of comparison has been merged.
  • Infrastructure: A library for Plutus scripts will be integrated in our tooling and benchmarking profiles. Also, a profile family aimed at the tracing systems has been added.
  • Nomad backend: Various specializations of the backend are currently being implemented, along with streamlining credentials management.

Benchmarking

We have adapted our benchmarking cluster to the requirements of the 8.0 release branch. Testing runs of a very early feature branch for 8.0 helped us localize an important issue in collaboration with the other teams. We look forward to gathering preliminary metrics for 8.0 soon.

Tracing

Analysis of resource usage profiles of both the legacy and new tracing system, with and without trace forwarding, have lead us to gather very detailed profiling data for each possible setup. This is to ensure we keep resource usage within the node to an absolute minimum, while still providing the highest possible throughput of data for forwarding to cardano-tracer.

Additionally, we've worked on a very practically-oriented document targeted at end users of the new tracing system. It provides tested step-by-step instructions for tunneling trace forwarding from a node to cardano-tracer via an easy to manage system service, which will match the production setup of most users.

Infrastructure & Analysis

General

Variance analysis as a full-fledged entity in our tooling has been merged. Not only is this type of analysis now part of our reporting pipeline - variance analysis can be fed back and serve as an additional point of comparison.

Furthermore, we've created a profile family for the workbench that's specifically aimed at measuring and comparing tracing system configurations.

Plutus library

We opened a PR containing a new package for benchmarking - an extendable library that holds all Plutus scripts we use in our benchmarking profiles. This will enable us in the future to iteratively work on customizing any given script, and the way is called in the context of a specific profile. It is a refinement of current affairs, where we have additional build inputs solely to generate a static script file tied to an external commit.

Nomad backend

The nomad backend is being specialized in three ways: using a podman driver locally, using nomad agents supporting nix installables, and using nomad cloud agents. This supports having a common surface independent of the actual backend driver being used. In addition, vault retrieval and management of cloud access credentials is being improved to minimize any friction for the backend user.

· 3 min read
Michael Karg
  • Benchmarking: We performed benchmarks for the new tracing system, and started benchmarking for varying GHC RTS configurations.
  • New tracing: Backwards compatibility with legacy tracer nomenclature has been merged; we're currently improving documentation and creating setup guidelines for end users.
  • Analysis pipeline: Our refined metrics PR has been merged. We're working on including variance analysis to our reporting machinery.
  • Infrastructure: Support for Conway genesis in our workbench has been merged. At the moment, we're laying the groundwork for enabling GHC 9.2 in our benchmarks.
  • Open Sourcing: The API demo has reached prototype phase; work on documenting the API and providing exemplifying use cases is ongoing.
  • Nomad backend: The nomad-exec based task driver has been merged. The backend has been equipped with the capability for genesis distribution via S3 bucket.

Performance

New tracing

The new tracing system has undergone various benchmarking runs with variance analysis, and comparison to a baseline using legacy tracing. We could observe a slight shift in the resource usage profile from memory to CPU, but no regressions in block propagation metrics. Variance was observed to be notably smaller, which gives the new system a much better predictability. From this angle, we consider the new system fit for production use.

GHC RTS parametrization

We're currently prerforming various runs on the cluster to explore the space of different GHC RTS settings for running nodes. The main focus lies on different configurations for the garbage collector, as well as increasing the number of CPU cores the node may use.

Open Sourcing

Our API demo has reached prototype stage, and operates on live data from the production database. Making use of the experience gained, we're refining version 1 of the API to provide optimized usability, and creating documentation that both is descriptive of the API endpoints, and focuses on practical, exemplary use cases.

Tracing

For the new tracing system we're currently undertaking an effort to multi-layered documentation: a condensed version, as well as a setup guide with pragmatical focus, will be provided alongside the in-depth documentation. This effort should cater to different audiences, and provide distinct entry points for users of the new system, depending on their wants and needs.

Infrastructure & Analysis

General

Having included Conway genesis in the workbench, as a next step in future-proofing out benchmarking infrastructure, we're laying the foundation for a switch in compiler version to GHC 9.2. Additionally, we considered variance analysis of our runs to merit inclusion into our reporting pipeling - which will increase confidence in specific metrics.

Nomad backend

We have implemented an appropriate mechanism for genesis distribution: Only after a benchmarking cluster has been deployed successfully, genesis is patched and uploaded to an AWS S3 bucket for the nodes to retrieve - as a final step before initiating the actual run. We're confident that this deferred approach will provide clearer evidence for genesis patches, as well as minimize startup time for all runs by factoring in deployment re-tries.

· 2 min read
Michael Karg
  • Release benchmarking: We again performed benchmarks for the next 1.35.6 release candidate.
  • New tracing: Backwards compatibility with legacy tracer nomenclature is being implemented to smoothe the transition for end users.
  • Analysis pipeline: A major refinement of benchmarking metrics has been realized, along with a structural improvementents regarding metrics denomination.
  • Open Sourcing: Work on going live with our benchmarking data has begun, as well as creating an API demo and documentation.
  • Nomad backend: The backend was adapted to a major refactoring in workbench and is being equipped with a nomad-exec based task driver.

Performance

1.35.6 release

Benchmarking the second release candidate for 1.35.6 could again attest to a perfectly clean bill of health.

Analysis pipeline

Our analysis pipeline has seen an introduction of additional metrics, especially when focusing on the block producing node. They allow us to better differentiate the timing of ledger ticking and mempool snapshotting in the forging loop - a feature that promises much deeper insight into UTxO-HD performance. Additionally, a restructuring of metrics names has been undertaken along with improvements in their data dictionary; a measure that will make benchmarking data more easily accessible.

Open Sourcing

As a prerequisite for going live with our benchmarking data, we're currently working on consolidation of existing analyses, such as to provide a common foundation when accessing them externally. Additionally, we've begun working on a small visualization demo and interactive API documentation. Those will enable third parties to make use of that data much more easily, by having reliable guidelines and a working example.

Tracing

The new tracing system is being outfitted with a comprehensive mapping of its structure to the legacy tracer nomenclature. This feature will make the switch to the new system as smooth as possible for end users, allowing them to gradually adapt their tooling without breaking any functionality in the process.

Infrastructure

Nomad backend

The Nomad backend was adapted to the latest major refactoring in workbench. Work was done on making stateful Nomad clients more autonomous, which will greatly facilitate any automation building on that backend. A task driver based on nomad-exec is currently being implemented.

· 3 min read
Michael Karg
  • SECP benchmarking: we concluded our benchmarking runs and analyses of the new SECP primitives for the Valentine hard-fork.
  • Release benchmarking: we performed a round of benchmarks for the 1.35.6 release.
  • UTxO-HD benchmarking: we performed first runs for UTxO-HD and are currently refining the benchmarking setup.
  • New tracing: for better accessibility, the new tracing system is being outfitted with introspective capabilities.
  • Infrastructure: with the Nomad cloud workbench backend we were able to perform our first test cluster runs successfully on SRE infrastructure.
  • Infrastructure: the initial NixOps workbench backend has been completed; a PR containing this work, along with many quality-of-life improvements of our tooling, got merged.

Performance

SECP

  1. For SECP, we settled on a fixed tx count per block, while simultaneously spending as much as possible of the block budget. Thus we were able to minimize the impact of per-SC-call overhead.
  2. The final runs were performed with various fractions, e.g. half, of the current block budget to ascertain how these workloads would fare compared to a value-only run.
  3. The SECP machinery and profiles are currently being generalized into an approach to aim for very specific aspects of a smart contract for benchmarking.

UTxO-HD

  1. After analyzing initial UTxO-HD runs, it turned out that mempool snapshotting had to be throttled for benchmarking; it affects a lock that UTxO-HD had to introduce into the forging loop.
  2. We're currently adapting the benchmark setup to that, and will then perform a new combination of baseline and UTxO-HD runs.

1.35.6 release

Benchmarking the 1.35.6 release candidate could attest to a perfectly clean bill of health.

Tracing

Work on the new tracing system's introspective capabilites is ongoing: Immediate use cases of the new API include being able to statically validate generated tracer documentation, as well as providing information of a specific tracing setup in the node via traces themselves. These features will make the new system both more robust, and more accessible.

Infrastructure

Nomad backend

  1. Work on the cloud deployment capability of the Nomad workbench backend continued; for testing we can automate multiple Nomad clients.
  2. Locality assumptions were removed and job monitoring was refactored.
  3. To facilitate directly-executable derivations, Nomad Job specification files are now self contained with GitHub references and configs needed to run a cluster.
  4. We're currently evaluating different options for genesis distribution in said cluster.

NixOps backend

The NixOps workbench backend has reached an initial functional stage. Consequently, the relevant PR was merged. It also contained many improvements to our analysis tooling, as well as a structural overhaul of workbench itself. We consider this an important step of future-proofing our benchmarking machinery.

· 3 min read
Serge Kosyrev

High level summary

  1. SECP benchmarking: we ran several rounds of SECP benchmarks, refining the benchmark setup as we discovered the properties of the system. After formulating an initial suggested change to the protocol parameters, we're currently running what we consider the final benchmark, to validate the underlying assumptions.
  2. Release benchmarking: we've performed a round of benchmarks for the hotfix 1.35 release update and initiated the 1.35.6 benchmarks.
  3. New tracing: the improvement in the tracing API, with the underlying restructuring, was completed and merged into the node.
  4. New tracing: before going live, we're performing the documentation update, as well as reworking the end user migration guide.
  5. Open sourcing: the benchmarking data publishing has been completed and deployed. After populating it with relevant benchmark data and providing basic user documentation we can go live.
  6. Infrastructure: the cloud workbench backend is progressing well, the networking aspects of multi-region deployment are currently being worked on.
  7. Infrastructure: the NixOps workbench backend is still being worked on, as part of migration from cardano-ops and benchmarking infrastructure unification.

Performance

We are approaching the end of a chain of SECP benchmarks, as we gradually eliminated deficiencies in the setup as we were discovering them and answering newly appearing questions:

  • we improved the tx/block filling strategy in the generator, to maximise the per-block utilisation of resources and so better approximate the worst-case,
  • after a discovery of what looked like significant per-SC-call overhead, we again tweaked the the tx/block filling strategy,
  • finally, we're redoing all benchmarks together with a value-only run against the backdrop of Mainnet-sized datasets, to balance the suggested adjustment. That also ran into difficulties wrt. limitations of our benchmarking hardware.

In addition, we started benchmarks of the 1.35.6 release.

Tracing

A rework of the new tracing system's internals and API was merged. It extended the system with introspection, which enabled a range of improvements, some of which were implemented along the way.

Specifically, we were able to completely short-cut processing of messages generated by the tracers that were made provably ineffective by current tracing configuration. Further, now ongoing work enabled by the introspection facilities, includes static validation of documentation and enhanced node state reporting.

Infrastructure

On the opensourcing/transparency front, the benchmark data publishing machinery was finally fully assembled and put online. As resources permit, we'll work on populating it with benchmarking data, preparing basic documentation and engaging the stakeholders.

The work on the cloud deployment capability of the Nomad workbench backend continued with focus on setting up inter-node networking and removal of locality assumptions. A major step besides those, was completion of a switch-over to the directly-executable derivations, which eliminate the need for creation and distribution of images -- thereby increasing the speed of deployment.

The Nixops workbench backend progressed steadily, reaching minimal deployment capability. The remaining parts are proper shared configuration generation, and porting of the run control functionality from cardano-ops.